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The  purpose  of  this  study  was  to  examine  a  new  psy- 
chometric screening  procedure  designed  to  discriminate 
intellectually  "gifted"  seventh  graders  from  other  high 
achieving  seventh  graders.  All  179  students  in  the  re- 
search sample  had  been  assessed  to  determine  eligibility 
in  the  Hillsborough  County  Public  Schools  "gifted"  pro- 
gram. An  accurate  and  efficient  screening  procedure  was 
necessary  in  order  to  delete  from  time-consuming,  formal 
assessment  those  students  who  were  unlikely  to  meet  intel- 
lectual criteria  for  gifted  program  placement.  The  Slos- 
son  Intelligence  Test  (SIT),  previously  used  to  screen 
students  referred  to  the  gifted  classes,  had  proved  inef- 
ficient for  this  purpose.   A  plethora  of  research  on 


methods  used  to  identify  gifted  students  revealed  that 
most  methods  were  less  than  accurate. 

The  Comprehensive  Test  of  Basic  Skills  and  Test  of 
Cognitive  Skills  were  item  analyzed  on  a  sample  of  179 
seventh  grade  subjects  using  two  correlational  procedures. 
Results  from  the  item  analyses  yielded  46  items  from  the 
phi  analysis  and  14  from  the  index  of  discrimination  anal- 
ysis that  discriminated  gifted  from  not  gifted  students, 
as  classified  by  the  WISC-R  or  Stanford  Binet.  Cross  val- 
idation was  conducted  on  a  second  sample  of  61  students  to 
determine  (a)  the  accuracy  of  the  new  screening  procedure 
(NP)  compared  to  the  SIT  and  (b)  at  what  cutoff  points 
either  the  NP  or  SIT  was  more  accurate.  In  cross  valida- 
tion, total  scores  of  both  item  subsets  were  compared  with 
SIT.  The  items  obtained  from  the  index  of  discrimination 
analyses  were  found  to  be  the  best  predictors  of  "gifted" 
intelligence  for  the  sample.  There  was  some  ambiguity  re- 
garding an  optimal  cutoff  score;  however,  a  score  .33 
standard  deviations  below  the  mean  score  was  found  to  gen- 
erally yield  most  accurate  predictions. 
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CHAPTER    I 
INTRODUCTION 


The  tremendous  increase  since  1970  in  school  programs 
for  the  intellectually  gifted  has  led  most  states  to  es- 
tablish guidelines  or  requirements  for  program  eligibility 
(Karnes  &  Brown,  1979;  Kolloff  &  Feldhusen,  1984).  One 
common  requirement  is  for  individual  intellectual  evalua- 
tion of  gifted  program  candidates  (Chambers,  Barron,  &, 
Sprecher,  1980;  Sternberg,  1982;  Vernon,  Adamson,  &  Ver- 
non, 1977).  Yarborough  and  Johnson  (1983)  reported  the 
use  of  individual  intelligence  test  minimum  scores  as  an 
eligibility  requirement  in  at  least  73%  of  the  nation's 
gifted  student  programs.  Although  educators  may  recommend 
and  actually  believe  in  the  importance  of  a  broader  view 
of  giftedness,  this  more  narrow  definition  (i.e.,  empha- 
sizing IQ)  is  frequently  employed  because  of  pragmatic 
concerns  ( Jenkins-Freidman,  1982).  The  "intelligence  quo- 
tient" has  therefore  emerged  as  the  primary  criterion  for 
gifted  program  eligibility  (Barklay,  Phillips,  &  Jones, 
1983;  Birch,  1984;  Guilford,  1975;  Karnes,  Edwards,  & 
McCallum,     1986). 

The     individual     intelligence     testing     requirement, 
coupled  with  federal   regulations  mandating   services   for 


children  with  special  educational  needs,  has  placed  a  bur- 
den on  local  school  districts  to  identify  gifted  students 
through  accurate  referral  procedures  and  efficient  use  of 
testing  personnel  (Karnes  &  Brown,  1979).  As  a  result, 
school  systems  are  currently  faced  with  demands  for  intel- 
lectual assessment  of  children  which  are  often  greater 
than  can  be  met  by  the  qualified  examiners  available  (Cro- 
foot  &  Bennett,  1980;  Fell  &  Fell,  1982).  There  is  a 
demand  for  screening  procedures  designed  to  facilitate  the 
referral  process  by  maximizing  the  use  of  individual 
testing  time  while  minimizing  errors  in  identification 
(Jenkins-Friedman,  1982;  Kramer,  Markley,  Shanks,  & 
Ryabik,  1983;  Rust  &  Lose,  1980;  Stephens  &  Gibson,  1963). 
The  Wechsler  Intelligence  Scale  for  Children--Revised 
(WISC-R)  and  Stanf ord-Binet  Intelligence  Scale  (S-B)  are 
the  most  widely  used  individual  tests  of  children's  intel- 
ligence (Bryan  &  Bryan,  1975;  Salvia  &  Ysseldyke,  1978; 
Wikoff,  1978).  Screening  tests  which  estimate  or  predict 
intelligence  scores  on  the  WISC-R  and  Stanf ord-Binet  have 
been  studied  for  use  among  gifted  school  populations  since 
the  1950s  (I'egnato  &  Birch,  1959;  Sheldon  &  Manolakes, 
1954)  . 


Problem 

The  research  problem  addressed  concerned  the  accurate 
identification  of  intellectually  gifted  students  from  a 
screening  procedure  prior  to  administration  of  an  individ- 
ual intelligence  test.  The  expense  of  administering  indi- 
vidual intelligence  tests  has  necessitated  the  screening 
of  potentially  gifted  students  in  an  effort  to  delete  from 
formal  testing  those  who  probably  do  not  possess  gifted 
intelligence  (Rust  &  Lose,  1980;  Stenson,  1982).  Inaccu- 
rate screening  has  resulted  in  not-gifted  students  receiv- 
ing the  time-consuming  tests  and  in  some  gifted  students 
being  excluded  from  testing. 

Screening  procedures  such  as  the  Slosson  Intelligence 
Test  (Dirkes,  Wessels,  Quaforth,  &  Quenon,  1980;  Grossman 
&  Johrtson,  1983;  Karnes  &  Brown,  1979;  Rust  &  Lose,  1980), 
the  Amnions  Quick  Test  (DeFilippis  &  Fulmar,  1980;  Hirsch  & 
Hirsch,  1980),  short  forms  of  the  WISC-R  (Bersoff,  1971; 
Elman,  Blixt,  &  Sawicki,  1981;  Karnes  &  Brown,  1981;  Kil- 
lian  &  Hughes,  1978;  Kramer  et  al.,  1983),  Guilford's 
Structure  of  the  Intellect  (Pearce,  1983),  the  Peabody 
Picture  Vocabulary  Test  (Mize,  Smith,  6.  Callaway,  1979; 
Wright,  1983),  and  group  IQ  tests  (Blosser,  1963;  Grossman 
&  Johnson,  1983;  Pegnato  &  Brich,  1959;  Sheldon  &  Mano- 
lakes,  1954)  have  all  been  shown  to  be  more  or  less  inac- 
curate and/or  inefficient  for  screening  high  ability 
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students.  Chambers  (1960)  and  Schena  (1963)  reported 
somewhat  more  encouraging  results  with  academic  skill 
measures  as  predictors  of  gifted  intelligence.  A  more  ac- 
curate and  efficient  means  for  screening  gifted  students 
needs  to  be  found. 

Gif tedness 

The  characteristics  associated  with  gifted 
intelligence  are  almost  as  numerous  as  the  students 
themselves  (Tuttle  &  Becker,  1980).  Qualitative  trait 
differences  between  gifted  and  not-gifted  children  are 
indicated  frequently  in  the  literature  (Barrington,  1979; 
Dirkes,  1981;  Gensley,  1975;  Male  &  Perrone,  1979;  Ricca, 
1984;  Ryan,  1982;  Sternberg,  1982);  however,  essential  to 
a  discussion  of  intellectual  giltedness  in  children  is 
their  classification  according  to  an  If)  test  cutoff  score. 
Gifted  classifications  by  IQ  cutoff  scores  are  employed  in 
an  attempt  to  objectify  classification  criteria  and  school 
placement  decisions.  Classification  by  IQ  score  may  mis- 
leadingly  imply  that  qualitative  differences  between 
gifted  and  not-gifted  children  are  necessarily  demarcated 
by  an  artificial  cutoff  (Braden,  1985).  In  this  study 
cutoff  scores  are  used  to  quantify  gifted  intelligence  ac- 
cording to  educational  criteria  and  not  to  define  learning 
style,  motivation,  or  other  personality  traits  associated 
with  "gif tedness." 


Purpose 

The  primary  purpose  of  this  study  was  to  determine 
whether  a  large,  group-administered  achievement/ability 
test  battery  possesses  items  that,  in  combination,  yield  a 
score  that  accurately  predicts  gifted  classification  as 
measured  by  the  WISC-R  or  Stanf ord-Binet .  A  secondary 
purpose  of  this  research  was  to  determine  if  the  new 
screening  procedure  (NP)  classifies  gifted  and  not-gifted 
seventh  graders  more  accurately  than  the  Slosson  Intelli- 
gence Test  (SIT)  and  at  what  cutoffs  these  classifications 
are  most  accurate. 

The  new  screening  procedure  developed  Lor  this  study 
consisted  of  a  subset  of  items  selected  from  norm  refer- 
enced, group-administered  tests  of  academic  aptitude  and 
achievement.  The  aptitude  test  used  in  this  study  was  the 
Test  of  Cognitive  Skills  (TCS)  and  the  achievement  battery 
was  the  Comprehensive  Test  of  Basic  Skills  (CTBS).  The 
TCS  and  CTBS  have  been  normal  on  the  same  sample  and  are 
typically  administered  in  concurrent  testing  sessions. 

Research  Questions 

The  following  research  questions  were  addressed: 
1.    Can  an  accurate  predictor  of  gifted  1Q  classifi- 
cation on  the  WISC-R/S-B  be  derived  from  an  instrument 


composed  of  items  on  the  CTBS  and  TCS  in  a  situation  in 
which  giftedness  is  viewed  as  a  dichotomous  variable? 

2.  Is  the  NP  more  accurate  than  the  SIT  in  classi- 
fying gifted  and  not-gifted  seventh  graders? 

3.  At  what  cutoff  point(s)  is  the  NP  more  accurate 
than  the  SIT  most  accurate  in  classifying  gifted  and  not- 
gifted  seventh  graders? 

Definition  of  Terms 

Comprehensive  Test  of  Basic  Skills  (CTBS).  The  CTBS 
is  a  "series  of  norm-referenced,  objective-based  tests  for 
kindergarten  through  twelfth  grade.  The  series  is  de- 
signed to  measure  achievement  in  the  basic  skills  commonly 
found  in  state  and  district  curricula"  (CTB/McGraw-Hill , 
1984,  p.  1).  At  the  junior  high  school  levels  the  content 
areas  are  reading,  spelling,  language,  mathematics,  refer- 
ence skills,  science,  and  social  studies. 

Full  Scale  IQ  (FS-IQ).  Full  Scale  IQ  is  the  derived 
intelligence  quotient  on  the  Wechsler  Intelligence  Scale 
for  Children--Revised  and  on  the  Stanf ord-Binet  Intelli- 
gence Scale. 

General  intelligence.  General  intelligence  is  a  set 
of  general  cognitive  operations  measured  as  the  overall 
ability  required  for  success  on  IQ  tests.   General 


intelligence  is  comprised  ol  those  traits  commonly 
measured  by  IQ  test  items. 

Gifted  intelligence.  Gifted  intelligence  is  measured 
as  an  intelligence  quotient  that  falls  in  the  Very  Supe- 
rior category  or  two  standard  deviations  above  the  test 
mean.  For  this  research,  gifted  IQ=130  -1  SEM  (where 
SEM=3  IQ  points)  on  the  WISC-R,  and  IQ  =  132  -1  SEM  (where 
SEM  =  5  IQ  points)  on  the  Stanf ord-Binet,  or  IQ=127.  (This 
definition,  which  includes  a  SEM,  is  based  on  guidelines 
from  Hillsborough  County  Florida  School  District). 

II  igh-ach i overs.  Students  who,  because  of  superior 
academic  performance,  have  been  referred  for  gifted  pro- 
gram testing,  but  who  are  assessed  as  not  intellectually 
gifted  are  referred  to  as  high-achievers. 

Intelligence.  Intelligence  is  "the  aggregate  global 
capacity  of  the  individual  to  act  purposefully,  to  think 
rationally,  and  to  deal  effectively  with  his  environment" 
(Wechsler,  1958,  p.  7).  Intelligence  is  comprised  of  in- 
tellective factors  such  as  "abstract  reasoning,  verbal, 
spatial,  numerical,  and  other  factors"  (Wechsler,  1950,  p. 
78),  and  nonintellective  factors  consisting  of  "capacities 
and  bents  dependent  upon  temperament  and  personality  which 
are  .  .  .  factors  of  personality  itself"  (Wechsler,  1950, 
p.  78). 

Intelligence  quotient.  An  intelligence  quotient  is  a 
derived  total  or  Full  Scale  score  on  an  intelligence  test. 


Performance  IQ  (P-IQ).  Performance  IQ  is  a  subscale 
IQ  score  on  the  Wechsler  Intelligence  Scale  for  Children-- 
Revised  that  represents  the  examinee's  nonverbal  reasoning 
or  perceptual  organization  (Kaufman,  1975). 

Reliability.  Reliability  is  the  squared  population 
correlation  between  the  individual's  obtained  score  and 
the  individual's  hypothetical  true  score.  Reliability  is 
"the  proportion  of  true-score  variance  in  scores  on  a  par- 
ticular test  at  the  time  it  was  taken"  (Jensen,  1980,  p. 
260)  . 

Slosson  Intelligence  Test  (SIT).  The  SIT  is  an  indi- 
vidually administered  intelligence  test  which  requires 
little  specialized  training  to  administer,  only  about  20 
minutes  to  administer  and  score,  and  yields  IQ  scores 
"which  are  close  approximations  to  the  Stanf ord-Binet  IQ 
[scores]"  (Slosson  &  Jensen,  1982,  p.  1). 

Stanf ord-Binet  Intelligence  Scale  (S-B).  The  S-B  is 
one  of  the  two  individually  administered  intelligence 
tests  used  in  this  study  to  measure  gifted  intelligence. 
(See  page  41  for  a  detailed  description.) 

Test  of  Cognitive  Skills  (TCS).  The  TCS  is  "an  abil- 
ity test  designed  to  assess  a  student's  academic  aptitude 
and  thereby  predict  the  student's  level  of  success  in 
school.  Emphasis  in  TCS  is  placed  on"  .  .  .  problem  solv- 
ing,  discovering   relationships,   evaluating,   and 


remembering"  (CTB/McGraw-Hi 1  1  ,  1984,  p.  1).  The  TCS  is 
administered  concurrently  with  the  CTBS. 

Val idity.  Validity  refers  to  the  appropriateness  of 
inferences  from  test  scores  or  other  forms  of  assessment. 
Validity  deals  with  how  faithfully  the  scores  represent  a 
domain  of  skill,  knowledge,  or  of  a  trait  being  measured 
(American  Psychological  Association,  1985). 

Verbal  IQ ( V-IQ) .   Verbal  IQ  is  a  subscale  IQ  score 

on  the  Wechsler  Intelligence  Scale  for  Children--Revised 
that  represents  the  examinee's  verbal  reasoning  or  verbal 
comprehension  (Kaufman,  1975). 

Wechsler  Intelligence  Scale  for  Children—Revised 
( WISC-R).   The  WISC-R  is  the  individually  administered 
IQ  test  predominantly  used  in  this  study  to  measure  gifted 
intelligence.   (See  page  35  for  a  detailed  description.) 

Theoretical  Rationale 

Salvia  and  Ysseldyke  (1978)  have  pointed  out  that 
there  is  a  hypothetical  domain  of  items  that  may  be  used 
to  assess  intelligence  and  that  items  may  be  drawn  from 
various  sources.  For  example,  WISC-R  Information  subtest 
items  are  drawn  from  a  domain  of  achievement  oriented 
items  that  measure  specific  content  of  learning  acquired, 
in  large  part,  through  formal  education.  This  overlapping 
of  achievement  and  aptitude  test  item  content  has  been 
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demonstrated  empirically  by  Anastasi  (1976).  Her  examina- 
tion of  the  content  of  several  current  instruments  classi- 
fied as  achievement  and  intelligence  tests  revealed  simi- 
larity in  their  content.  Supporting  this  finding  she 
contended  that  it  has  long  been  known  that  IQ  tests  corre- 
late about  as  highly  with  achievement  tests  as  different 
IQ  tests  correlate  with  each  other.  Further,  one  of  the 
most  frequently  employed  means  of  validating  IQ  tests  is 
to  compare  them  with  measures  of  achievement. 

In  another  attempt  to  show  some  common  elements  of 
achievement  and  intelligence,  Gronlund  (1976)  compared 
factors  measured  by  both  reading  readiness  tests  and  IQ 
tests.   These  elements  included 

1.  visual  discrimination--identif ying  similarities 
and  differences  in  words  or  pictures; 

2.  auditory  di scr imination-- identify ing  similar- 
ities and  differences  in  spoken  words; 

3.  verbal  comprehension--demonstrat ing  an  under- 
standing of  the  meaning  of  words,  sentences,  and 
directions;  and 

4.  copy ing--demonstrating  skills  in  reproducing 
geometric  forms. 

In  his  analysis  of  the  WISC-R  Verbal  Scale,  Kaufman 
(1979)  identified  item  content  that  reflects  properties  of 
achievement  tests.   For  example,  the  Information  subtest 
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of  the  WISC-R  measures  acquired  knowledge  and  is  influ- 
enced by  outside  reading  and  school  learning.  The  Sim- 
ilarities subtest  was  also  found  to  be  subject  to  reading 
and  vocabulary  knowledge.  Likewise,  the  other  Verbal 
Scale  subtests  were  found  to  have  strong  components  of 
acquired  knowledge. 

In  one  of  a  very  few  studies  in  which  IQ  test  item 
content  was  related  to  specific  academic  skills,  Washing- 
ton, Engelmann,  and  Bereiter  (1969)  conducted  an  item 
analysis  of  the  Stanf ord-Binet  Intelligence  Scale  and  at- 
tempted to  construct  an  academic  curriculum  from  it. 
After  the  curriculum  was  presented  to  students  an  achieve- 
ment test  was  administered.  Results  showed  that  the 
prelearned  S-B  items  were  positively  correlated  with  post- 
test  achievement  items  for  particular  learning  tasks.  In 
a  second  phase  of  the  study  no  pretest  was  administered. 
However,  subsequent  to  the  curriculum  presentation  and 
post-test  administration  the  Stanf ord-Binet  was  given. 
The  achievement  test  results  were  found  to  accurately  pre- 
dict S-B  scores  in  terms  of  items  responses.  Results  sug- 
gested content  validity  across  the  IQ  and  achievement 
measures . 

A  number  of  other  researchers  have  investigated  the 
relationship  between  intelligence  and  achievement  measures 
and  found  them  to  be  positively  correlated  (Hale,  1978; 
Hartlage  &.  Steele,   1977;  Reschley  &  Reschley,   1979; 
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Schwarting  &  Schwarting,  1977)  or  to  possess  significant 
overlap  in  their  factor  loadings  (Carroll,  1966;  Dean, 
1977;  Grossman  &  Johnson,  1982;  Horn,  1970;  Stewart  &  Mor- 
ris, 1977;  Undheim,  1976;  Vernon,  1961,  1969;  Wikoff, 
1978)  . 

The  TCS,  in  terms  of  its  title  and  stated  purpose, 
encompasses  a  construct  closely  related  to  the  WISC-R  and 
Stanf ord-Binet  in  that  all  three  were  designed  and  have 
been  found  to  predict  academic  attainment  (Hurrocks,  1964; 
CTB/McGraw-Hill,  1984;  Terman  &  Merrill,  1973;  Wechsler, 
1974).  As  described  in  its  1983  Technical  Report,  the  TCS 
subtests  (Sequences,  Analogies,  Memory,  and  Verbal  Reason- 
ing) load  on  factors  consistent  with  those  described  in 
WISC-R  and  Stanf ord-Binet  studies  (Kaufman,  1979). 

The  TCS,  according  to  its  constructors,  measures  "a 
number  of  cognitive  abilities  included  in  various 
theories,  however,  like  the  WISC-R  and  Stanf ord-Binet,  em- 
phasis is  placed  on  the  kinds  of  reasoning  and  retention 
skills  necessary  for  school  success"  (CTB/McGraw-Hi 1 1 , 
1984,  p.  5).  The  CTBS  similarly  measures  a  variety  of 
academic  skill  areas  shown  to  be  positively  correlated 
with  the  TCS.  The  TCS  and  CTBS  together  appear  to  assess 
a  theoretical  construct  common  to  the  WISC-R  and  Stanford- 
Binet.   Therefore,  the  items  chosen  from  the  CTBS  and  TCS 
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should  conform  to  that  construct,  thus  allowing  discrim- 
ination of  gifted  from  not-gifted  students  as  do  the  WISC- 
R  or  Stanford-Binet. 

Support  for  the  common  item  content  of  the  CTBS,  TCS, 
WISC-R,  and  Stanford-Binet  will  be  provided  here  by 
illustrations  of  actual  items  found  on  these  tests.  Items 
are  categorized  arbitrarily  according  to  subtest  classifi- 
cation and  to  face  commonalities.  A  brief  description  and 
reference  location  will  be  given  for  those  non-verbal  test 
items  that  cannot  be  readily  reproduced  in  this  format. 
Items  indicated  are  those  designed  for  average  and  above 
average  seventh  graders.  Test  items  are  printed  in  bold- 
face . 

1.  WISC-R,  Vocabulary-instructions:   What  does     • 

mean? 

rivalry 

CTBS,  Vocabulary  (instructions:  Choose  the  word  or 
phrase  that  means  the  same,  .  .  .,  as  the  underlined 
word . ) 

their  opponent 

A.  foe 

B.  employee 

C.  architect 

D.  assistant 

2.  WISC-R,  Arithmetic 

Tony  bought  a  second  hand  bicycle  for  $28.00. 
He  paid  2/3  of  what  the  bicycle  cost  new.  How 
much  did  it  cost  new? 

CTBS,  Mathematics  Concepts  and  Applications 

Homer's  recipe  will  make  48  sugar  cookies.  He 
made  3/4  of  this  recipe  for  a  party.  How  many 
cookies  were  for  each  of  the  18  people  at  the 
party? 


i  4 


3.  Stanford-Binet,    Arithmetic   Reasoning 

If  a  man's  salary  is  $20  a  week  and  he  spends 
$14  a  week,  how  long  will  it  take  him  to  save 
$300? 

CTBS,  Mathematics  Concepts  and  Applications 

To  pay  for  groceries,  Scott,  Marvin,  and  Carol 
each  gave  the  clerk  $1.35.  The  clerk  gave  them 
$.45  in  change.  How  much  did  the  groceries 
cost? 

P.    $1.80 

G.    $3.60 

H.    $4.05 

J.    $4.50 

4.  Stanford-Binet,  Vocabulary 

What  does  "Brunette"  mean? 

CTBS,  Vocabulary  (instructions:  Choose  the  word  or 
phrase  that  means  the  same  ...  as  the  underlined 
word. ) 

successful  merchant 

P.    parade 

G.    business 

H.    customer 

J .    shopkeeper 

5.  WISC-R  Picture  Arrangement  (instructions:  "...  I 
want  you  to  arrange  these  pictures  in  the  right  order 
to  tell  a  story  that  makes  sense.") 

TCS,  Sequence  (instructions:  ".  .  .  choose  the  part 
that  would  continue  the  pattern  or  sequence.")  Vari- 
ous visual  stimuli  are  presented  such  as  letters, 
numbers  or  geometric  shapes. 

6.  WISC-R,  Similarities 

In  what  way  are  a  telephone  and  a  radio  alike? 

TCS,  Verbal  Reasoning  (paraphrased  instructions:   The 
words  in  the  top  and  bottom  rows  are  related  in  the 
same  way.   Find  the  word  that  completes  the  bottom 
row  of  words.) 

radio  electricity  music 

paper  newspaper 

F.  ink 

G.  story 

H.    reporter 
I.    typewriter 
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7.  Stanford-Binet ,  Problems  of  Fact 

"An  Indian  who  had  come  to  town  for  the  first 
time  in  his  life  saw  a  boy  riding  along  the 
street.  As  the  boy  rode  by  the  Indian  said, 
"The  white  boy  is  lazy;  he  walks  sitting  down! 
What  was  the  boy  riding  on  that  caused  the  In- 
dian to  say,  x  He  walks  sitting  down'? 

TCS,  Verbal  Reasoning  (instructions:   find  the  true 
statement) 

All  bicycles  have  gears. 

Some  bicycles  have  ten  speeds. 

Maria  has  a  bicycle. 

I*.    Maria  likes  her  bicycle. 

G.    Maria's  bicycle  has  gears. 

H.    Maria's  bicycle  goes  too  fast. 

J.    Maria's  bicycle  has  ten  speeds. 

8.  Stanford-Binet,    Induction 

(This  is  a  sequential  test  in  which  paper  is 
folded  and  holes  cut  in  it  by  the  examiner.  The 
student  must  deduce  a  pattern  to  predict  how 
many  holes   will    result   from   each   cut.) 

TCS,    Sequences 

(Students  are  presented  sequential  patterns  that 
are  incomplete.  The  student  must  deduce  the 
pattern    and    predict    the    final    pattern.) 


Need    for   This    Study 

Teachers  at  the  seventh-grade  level  typically  experi- 
ence more  difficulty  identifying  potentially  gifted  stu- 
dents than  do  teachers  at  lower  grade  levels  (Schnell, 
1982).  Consequently,  junior  high  school  teachers  refer  a 
larger  proportion  of  not-gifted  students  for  evaluation 
than  do  teachers  of  elementary  school  students.  This 
phenomenon  is  believed  to  result  from  more  limited  contact 
between  individual  teachers  and  students  and  from  the  fact 
that   the   pool   of   potentially  gifted   students   from   which 
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junior  high  school  teachers  choose  contains  none  of  the 
students  who  have  been  identified  as  gifted  during 
previous  years.  Junior  high  school  teachers  must  base 
their  referrals  on  "the  best  of  the  rest."  Clearly  the 
most  advantageous  time  to  identify  students  for  a  junior 
high  school  curriculum  for  the  gifted  is  when  they  enter 
seventh  grade. 

The  new  procedure  (NP)  was  developed  by  selecting 
items  from  popular  group  tests  administered  at  seventh- 
grade,  measuring  both  achievement  and  cognitive  ability. 
The  rationale  for  choosing  screening  items  from  standar- 
dized group  tests  is  that  this  approach  to  student  screen- 
ing is  both  time-  and  cost-efficient.  All  students  in  the 
population  school  district  take  the  CTBS  and  TCS  annually 
and  the  screening  data  are  readily  available  without  addi- 
tional tests  or  testing  time  being  needed. 

It  was  believed  that  of  the  460  CTBS  and  TCS  ques- 
tions typically  administered  to  seventh  graders,  there 
existed  a  subset  of  items  that  would  accurately  predict 
gifted  IQ  on  the  WISC-R  or  S-B.  Because  the  number  of 
CTBS/TCS  items  is  large,  sampling  of  a  wide  range  of 
skills  and  abilities  is  possible.  As  the  diversity  of  the 
items  increases,  so  does  the  general  ability  measured  by 
the  total  test  (Kaufman,  1979). 
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The  utility  of  finding  a  small  pool  of  items  that 
correlates  highly  with  gifted  intelligence  is  that  (a) 
current  forms  of  the  CTBS/TCS  selected  items  may  be  re- 
tained for  group  administration  to  future  gifted  class 
candidates,  (b)  if  a  gifted  candidate  is  not  present  for 
CTBS/TCS  testing  the  entire  test  will  not  have  to  be 
administered  as  a  gifted  screening,  and  (c)  the  procedure 
of  analyzing  a  test  in  this  way  might  be  useful  for  pre- 
dicting intelligence  (or  other  traits)  among  other  popula- 
tions, utilizing  these  or  other  tests. 

Overview  of  the  Remainder  of  the  Study 

The  subsequent  content  of  this  study  is  divided  into 
four  chapters.  In  Chapter  II  a  review  of  related  liter- 
ature is  presented.  A  description  of  the  methodology  used 
for  the  research  comprises  Chapter  III.  Research  results 
are  presented  in  Chapter  IV  and  results  are  discussed  in 
Chapter  V. 


CHAPTER  II 
LITERATURE  REVIEW 


Support  for  the  Problem 


The  psychometric  screening  of  "gifted  intelligence" 
is  beset  by  problems  of  not  only  time-  and  cost-efficiency 
but  of  predictive  accuracy.  In  the  relevant  literature  it 
is  suggested  that  these  problems  exist  for  a  variety  of 
screening  methods  and  procedures.  These  problems  are  not 
of  recent  origin,  however.  As  early  as  1959,  Pegnato  and 
Birch  found  that  sufficient  psychological  services  were 
rarely  available  to  test  all  of  the  gifted  class  candi- 
dates, thereby  necessitating  procedures  for  screening 
prior  to  formal  testing.  Accordingly,  the  authors  con- 
ducted an  investigation  of  the  "relative  efficiency  and 
effectiveness"  (p.  300)  of  seven  procedures  for  locating 
gifted  children  in  junior  high  schools:  teacher  ratings, 
class  rank,  creative  ability  in  art  or  music,  student 
council  membership,  superiority  in  mathematics,  group 
achievement,  and  group  IQ.  Seven  hundred  eighty-one  met- 
ropolitan school  district  students  were  selected  for  par- 
ticipation in  the  study  on  the  basis  of  high  ratings  in 
one  or  more  of  the  seven  categories.  All  of  the  partic- 
ipants received  the  Stanford-Bine t.    Scores  on  this 


L9 


intelligence  test  were  used  as  a  criterion  reference  for 
the  designation  of  those  children  who  were,  indeed, 
gifted.  For  each  of  the  seven  screening  procedures, 
effectiveness  was  judged  by  the  percentage  of  gifted  chil- 
dren located;  efficiency  was  defined  as  the  ratio  between 
total  number  of  gifted  students  and  students  predicted  as 
gifted.  Of  the  781  subjects,  yi  (6.5%  oi  the  school  popu- 
lation) were  judged  to  be  gifted.  These  results  indicated 
that,  among  the  seven  methods,  group  IQ  and  achievement 
tests  were  the  better  predictors,  providing  the  best  pos- 
sible combination  of  effectiveness  with  efficiency.  Other 
methods,  such  as  honor  role  inclusion,  were  fairly  effec- 
tive, but  their  efficiency  was  poor. 

Even  though  Pegnato  and  Birch  (1959)  were  largely  un- 
successful in  finding  an  effective  predictor  of  gifted  IQ, 
a  substantial  amount  of  research  has  focused  on  screening 
the  gifted  since  that  time.  The  following  sections  of 
this  literature  review  are  concentrated  on  four  procedures 
for  gifted  student  identification  that  involve  group  IQ 
tests,  IQ  short  forms,  IQ  screening  tests,  and  achievement 
test  scores. 

Group  IQ  Tests 

The  largely  unsuccessful  attempt  by  Pegnato  and  Birch 
(1959)  to  predict  gifted  intelligence  using  group  IQ  tests 
was  followed  one  year  later  by  a  similar  study.   Chambers 
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(1960)  sought  a  screening  instrument  for  use  in  a  Michigan 
school  district.  Using  the  IPAT  (Cattell's  test  of  gen- 
eral intelligence),  the  California  Test  of  Mental  Matu- 
rity, the  SRA  Primary  Abilities  Test,  The  Kuhlman-Anderson 
Intelligence  Test,  and  the  WISC,  Chambers  tested  39  chil- 
dren in  grades  three  through  six.  For  each  screening 
test,  a  cutoff  was  calculated  above  which  all  gifted  stu- 
dents (WISC  IQ>124)  would  be  identified.  The  accuracy  of 
each  screening  procedure  was  established  at  100%,  and  the 
efficiency  was  then  determined  based  on  the  number  of  not- 
gifted  students  misclassif ied  by  the  screening  procedure 
as  gi  ftod.  The  results  revealed  th.it  t  ho  SKA  tent  and  the 
Ku 1 hman-Anderson  could  be  ranked  respectively  as  the  most 
and  least  efficient,  and  that  between  20%  and  57%  of  the 
students  predicted  as  gifted  were  not. 

Three  years  after  Chambers'  study,  Blosser  (1963) 
tested  187  ninth  graders  on  the  Henmon-Nel son  and  Otis 
group  intelligence  tests.  The  research  sample  had  a  mean 
IQ  of  120  on  the  Stanf ord-Binet  with  a  range  of  98  to  153. 
The  results  indicated  that  of  the  36  students  predicted  as 
gifted  by  the  Otis,  only  13  (36%)  were  identified  as  such 
by  the  Stanf ord-Binet.  On  the  Henmon-Nelson  13  of  26  stu- 
dents (57%)  were  correctly  predicted  as  gifted.  Because 
19%  of  the  gifted  students  were  not  identified  by  either 
group  test,  both  tests  proved  to  be  poor  predictors  of 
gif tedness . 
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More  recently,  Harrington  (1982)  also  found  that 
group  IQ  tests  tend  to  underestimate  the  IQs  of  many  in- 
tellectually gifted  students.  According  to  Harrington, 
for  every  student  identified  as  gifted  on  a  group  IQ  test, 
one  gifted  child  is  not  referred.  Harrington  suggested 
that  the  higher  the  ability  level,  the  greater  the  dis- 
crepancy between  individual  and  group  IQ  scores.  He  also 
found  that  a  child's  IQ  may  vary  by  as  much  as  30  points 
between  group  and  individual  tests.  further,  because 
there  may  be  a  very  small  number  of  items  at  the  greater 
difficulty  levels  on  group  tests,  a  child  may  have  to  per- 
form perfectly  to  be  predicted  as  gifted. 

Intelligence  Quotient  Short  Forms 

So-called  IQ  short  forms  are  comprised  of  abbreviated 
versions  of  individually  administered  standardized  intel- 
ligence tests.  Typically  selected  for  short  forms  are 
subtests  of  the  WISC-R  or  items  from  the  Stanf ord-Binet. 
Short  forms  of  the  Wechsler  Scales  and  Stanf ord-Binet  have 
been  studied  extensively  (Birch,  1955;  Carleton  &  Stacey, 
1954;  Enburg,  Rowley,  &  Stone,  1961;  Findley  &  Thompson, 
1958;  Grossman  &  Galvin,  1987;  Meister  &  Kurko,  1951; 
Nichols,  1962;  Simpson  &  Bridges,  1959;  Wright  &  Sandry, 
1962;  Wade,  Phelps,  &  Falasco,  1986;  Yakowitz  &  Armstrong, 
1955;  Zimet,  Farley,  &  Dahlen,  1985).  However,  it  is  only 
since  1978  that  short  forms  have  been  relatively  widely 
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studied  as  a  method  for  screening  potentially  gifted  stu- 
dents . 

An  early  attempt  to  predict  gifted  IQ  using  a  short 
form  test  was  conducted  by  Thompson  and  Findley  in  1962. 
Finding  that  the  Similarities  (S),  Information  (I),  Pic- 
ture Arrangement  (PA),  Block  Design  (BD),  and  Picture  Com- 
pletion (PC)  WISC  subtests  could  be  effectively  used  for 
this  purpose,  Thompson  and  Findley  published  the  Califor- 
nia Abbreviated  WISC  for  the  Intellectually  Gifted  (CAW- 
IQ)  in  1966). 

In  their  study,  Killian  and  Hughes  (1978)  measured 
the  effectiveness  of  the  Lyman  short  form  (Lorr  &  Meister, 
1942)  and  the  Vocabulary-Block  Design  subtests  of  the 
WISC-R  dyad  for  predicting  IQ  on  the  Stanf ord-Binet  and 
WISC-R  respectively.  Subjects  were  142  students  between 
5-  and  15-years-old  possessing  a  mean  IQ  of  125.  Results 
indicated  a  correlation  of  r=.92  between  the  WISC-R  and  V- 
BD  dyad  whereas  the  Stanf ord-Binet  and  Lyman  scores  were 
correlated  at  r=.78.  Killian  and  Hughes  did  not  present 
results  of  the  actual  number  of  students  correctly  pre- 
dicted as  gifted.  They  did,  however,  indicate  that  32%  of 
the  students  had  short  form/Full  Scale  IQ  score  discrep- 
ancies of  6  points  or  more. 

Employing  a  much  larger  sample  of  students  than  did 
previous  researchers,  Karnes  and  Brown  (1981)  used 
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Si  1 verstein 's  (1970)  method  of  deriving  "the  best  short 
form  combinations"  (p.  169)  to  obtain  an  accurate  gifted 
IQ  predictor.  Sil verstein's  method  takes  subtest  unreli- 
ability into  account  when  measuring  predictive  ability. 
Nine  hundred,  forty-six  gifted  children  ages  6.0  to  16.0 
(X  chronological  age  [CA]  =  9.9)  served  as  subjects. 
Karnes  and  Brown  found  that  the  W1SC  Block  Design  subtest 
was  represented  frequently  in  subtest  combinations  that 
correlated  with  WISC  Full  Scale  IQ.  Supporting  Killian 
and  Hughes'  findings,  the  V-BD  dyad  was  found  to  be  the 
most  accurate  for  predicting  gifted  IQ.  The  use  of  sub- 
test tetrads  was  found  to  be  useful,  increasing  correla- 
tion coefficients  from  .628  to  .734.  Again,  actual  accu- 
racy ratios  were  not  provided  in  the  study. 

Proceeding  under  the  notion  that,  "since  a  short  form 
IQ  test  is  composed  entirely  of  some  subset  of  questions 
of  items  taken  directly  from  a  full-length  IQ  test,  a 
short  form  would  seem  to  be  an  ideal  predictor  of  full- 
length  IQ  test  performance"  (p.  40),  Dirks,  Wessels,  Qua- 
forth,  and  Quenon  (1980)  compared  various  short  form  com- 
binations with  Full  Scale  IQ  on  the  WISC-R.  Subjects  con- 
sisted of  47  fourth  graders  with  a  mean  IQ  of  123  (range  - 
106  to  144).  Twelve  WISC-R  subtest  combinations  were 
studied.  It  was  revealed  that  the  short  form  combinations 
of  Similarities,  Object  Assembly  and  Vocabulary  and  S-OA 
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were  each  good  predictors  of  Full  Scale  IQ.  Although  cor- 
relations on  the  BD  subtest  were  high,  as  shown  in  previ- 
ous studies,  they  tended  to  predict  an  excessive  number  of 
nongifted  students  as  gifted.  The  S-OA  dyad  predicted  8 
of  11  gifted  students  and  4  who  were  not.  The  S-OA-V 
triad  predicted  9  of  11  gifted  students  and  4  who  were 
not. 

Utilizing  the  studies  by  Killian  and  Hughes  (1978) 
and  Dirks  et  al.  (1980),  who  noted  that  V-BD  and  S-OA 
dyads,  respectively,  were  the  most  effective  in  predicting 
Full  Scale  IQ,  Fell  and  Fell  (1982)  evaluated  92  WISC-R 
protocols  of  children  previously  evaluated  as  gifted  pro- 
gram candidates.  The  students  ranged  in  age  from  6-0  to 
11-7  (X  age  =  8.4)  and  possessed  Full  Scale  IQs  of  130  or 
greater.  Eleven  subtest  dyads  were  studied  in  terms  of 
frequency  with  which  each  produced  an  estimated  IQ  >  130. 
Greatest  predictive  accuracy  was  achieved  using  the  S-V 
and  S-OA  dyads.  These  correctly  predicted  62%  as  gifted. 
The  I-BD  dyad  yielded  prediction  ratings  of  only  43%.  Not 
providing  an  exact  number,  the  authors  indicated  that  some 
gifted  children  were  overlooked.  While  results  are  con- 
sistent with  findings  by  Dirks  et  al.,  indicating  that  the 
S-OA  dyad  is  most  effective,  prediction  accuracy  was  much 
lower  in  this  study. 

In  a  fairly  recent  study,  Kramer,  Markley,  Shanks, 
and  Ryabik  (1983)  utilized  Thompson  and  Findley's  (1966) 
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CAW-IQ  on  a  sample  of  73  children,  ages  6-0  to  16-7  (X  age 
=  10-5).  All  subjects  received  the  WISC-R  and  all  were 
analyzed  in  terms  of  the  S,  I,  PA,  BD,  and  PC  subtest  pat- 
tern. Of  the  48  students  predicted  as  gifted,  39  were 
predicted  accurately;  of  25  students  predicted  not  to  be 
gifted,  21  were  correctly  described.  This  subtest  short 
form  was  considered  to  be  a  relatively  accurate  predictor 
of  gifted  IQ. 

Intelligence  Quotient  Screening  Tests 

Individually  administered  intelligence  tests  designed 
to  estimate  mental  ability,  usually  in  20  minutes  or  less, 
have  become  a  widely  used  procedure  for  screening  gifted 
intelligence.  The  Slosson  Intelligence  Test  (Slosson  & 
Jensen,  1982)  is  one  such  screening  test.  It  had  been 
adopted  in  the  Hillsborough  County  Florida  school  district 
for  the  purpose  of  screening  the  gifted.  High  correla- 
tions between  the  SIT  and  the  WISC-R  or  Stanf ord-Binet 
have  been  reported  in  research  findings  (Lawrence  &  Ander- 
son, 1979;  Martin  &  Kidwell,  1977;  Martin  &  Rudolph,  1972; 
Mize,  Smith,  &  Callaway,  1979;  Ritter,  Duffy,  &  Fischman, 
1973;  Slosson  &  Jensen,  1982;  Stewart  &  Jones,  1976). 
However,  the  few  available  studies  conducted  on  gifted 
samples  have  not  supported  the  use  of  the  SIT  for  screen- 
ing . 
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In  the  previously  discussed  study  (see  Chapter  I)  by 
Grossman  and  Johnson  (1983),  the  Otis  Lennon  group  IQ  test 
was  found  to  be  a  better  predictor  of  gifted  achievement 
than  the  SIT  for  high  achieving  students.   Dirks,  Wessels, 
Quaforth,  and  Quenon  (1980)  also  found  the  SIT  to  be  a 
poor  predictor  of  gifted  ability.   They  administered  the 
SIT  to  47  academically  talented  fourth  graders.   The  stu- 
dents were  also  administered  the  WISC-R  to  determine  their 
actual  IQ  scores.   Of  the  11  students  who  were  found  to 
possess  gifted  intelligence  (IQ>130),  only  8  were  identi- 
fied as  such  by  the  SIT.   In  addition,  the  SIT  falsely 
predicted  gifted  intelligence  in  9  of  the  38  nongifted 
children.   The  researchers  concluded  that  the  SIT  alone 
should  not  be  used  to  predict  IQs  of  gifted  children.   The 
SIT  has  also  been  found  to  significantly  overestimate  IQ 
scores.   Machen  (1972)  investigated  the  reliability  and 
concurrent  validity  of  the  SIT  with  the  WISC,  using  5 
gifted  children  ages  9  through  11.   The  results  revealed  a 
significant  correlation  between  the  two  tests,  though  the 
SIT  tended  to  overestimate  the  WISC  by  at  least  one  stan- 
dard deviation.   Additionally,  the  SIT  has  been  shown  to 
underestimate  IQ  scores.   Mize  et  al.  (1979)  found,  in 
their  study  of  207  students  from  all  grade  levels,  that  of 
students  with  above  average  intelligence,  24%  were  overes- 
timated and  24%  were  underestimated  by  11  or  more  IQ 
points  on  the  SIT. 
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In  1979  Karnes  and  Brown  further  examined  the  ten- 
dency of  the  SIT  to  over-  or  underestimate  IQ  scores.  In 
this  study  the  validity  of  the  SIT  in  relation  to  the 
WISC-R  was  assessed  for  a  group  of  79  gifted  children  ages 
6  through  12.  A  SIT-WISC-R  correlation  of  r=.48  was  cal- 
culated; this  coefficient  was  significant  at  the  .001 
level.  The.  authors  also  computed  a  regression  equation 
with  which  to  predict  WISC-R  IQ  from  the  SIT.  Results 
indicated  that  at  the  lower  ranges  of  SIT  scores,  it 
tended  to  underestimate  the  WISC-R,  while  at  the  upper 
ranges  IQ  was  overestimated.  Despite  the  high  correlation 
between  the  two  tests,  it  is  apparent  that  Karnes  and 
Brown  were  not  confident  in  the  SIT's  predictive  ability 
because  they  recommended  using  two  SEMs  for  the  SIT  when 
screening  gifted  IQ  to  ensure  that  most  gifted  students 
are  identified.  To  obtain  95%  accuracy,  a  cutoff  score  of 
105  would  have  been  necessary.  However,  Karnes  and  Brown 
did  not  indicate  how  many  nongifted  students  would  have 
been  predicted  as  gifted  using  a  cutoff  this  low. 

In  a  similar  study,  presented  at  the  Annual  Meeting 
of  the  Alabama  Association  of  School  Psychologists  in 
1983,  Apple  discussed  the  precision  of  the  SIT  in  pre- 
dicting WISC-R  IQ  of  61  gifted  students  ages  6  to  11. 
Differences  in  scores  were  compared  by  use  of  independent 
t-tests.   The  results  supported  the  findings  of  Karnes  and 
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Brown  that  at  the  lower  SIT  ranges  WISC-R  IQs  were  under- 
estimated and  that  at  the  upper  SIT  ranges  the  WISC-R  IQs 
were  overestimated.  Apple  concluded  that  valuable  diag- 
nostic information  yielding  a  qualitative  picture  of  the 
child's  strength  is  omitted  when  the  SIT  alone  is  used  as 
a  screening  indicator. 

Whereas  Karnes  and  Brown  as  well  as  Apple  compared 
SIT  and  WISC-R  scores  for  youngsters  already  placed  in 
gifted  classes,  Rust  and  Lose  (1980)  attempted  to  accu- 
rately screen  potentially  gifted  students  in  first  through 
seventh  grade.  Based  on  teacher  referrals  and  SIT  scores 
of  130  or  above,  438  students  were  found  eligible  for 
WISC-R  evaluation.  Of  these,  132  were  utilized  in  the  re- 
search sample.  According  to  stepwise  regression  equa- 
tions, the  SIT  was  found  to  be  a  significant  predictor  of 
Full  Scale  IQ.  However,  of  the  132  students  predicted, 
only  61  achieved  WISC-R  IQs  of  130  or  above.  Thus,  set- 
ting the  SIT  cutoff  at  130  failed  to  screen  out  54%  of  the 
nongifted  students.  If  a  cutoff  of  134  had  been  used,  as 
suggested  by  Karnes  and  Brown,  42  evaluations  would  have 
been  eliminated.  However,  of  those  42,  12  would  have  been 
gifted.  Karnes  and  Brown  noted  that  while  there  was  a 
high  correlation  between  the  SIT  and  WISC-R,  there  was  a 
great  deal  of  variability  with  individual  cases.  It  was 
concluded  that  in  all  studies  high  error  can  be  expected 
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when  the  SIT  is  used  to  predict  WISC-R  IQ  among  the 
gifted . 

Other  IQ  screening  tests  such  as  Guilford's  Structure 
of  the  Intellect  Test  (SOI)  (Pearce,  1983;  Stenson,  1982), 
Ravens  Progressive  Matrixes  (Pearce,  1983;  Petty  &  Field, 
1980),  The  Peabody  Picture  Vocabulary  Test  (Mize  et  al., 
1979;  Pedriana  &  Bracken,  1982),  and  The  Amnions  Quick  Test 
(Joesting  &  Joesting,  1971;  Kendall  &  Little,  1977;  Nich- 
olson, 1977)  have  been  correlated  with  the  WISC,  WISC-R, 
and  the  Stanf ord-Binet,  In  some  instances  significant 
correlations  have  been  found.  However,  very  few  studies 
have  been  conducted  with  samples  of  gifted  students.  In 
one  such  study,  DeFilippis  and  Fulmer  (1980)  found  that 
the  Ammons  Quick  Test  underestimated  WISC-R  IQ  for  99 
first,  fourth,  and  seventh  graders  with  high  ability. 

In  another  study  involving  samples  of  gifted  stu- 
dents, Wright  (1983)  correlated  WISC  and  Peabody  Picture 
Vocabulary  Test  (PPVT)  scores  of  35  students  referred  by 
teachers  for  gifted  program  testing,  A  correlation  of 
r=.27  was  calculated  and  it  was  found  that  nearly  half  of 
those  who  scored  two  standard  deviations  above  the  PPVT 
mean  were  not  eligible  for  gifted  program  placement  based 
on  WISC-R  IQ  scores.  Wright  recommended  that  the  PPVT  not 
be  used  to  screen  gifted  program  candidates. 

A  third  study  was  conducted  by  Stenson  (1982)  to  de- 
termine the  concurrent  validity  of  the  Structure  of 
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Intellect  (SOI)  Gifted  Screener  with  the  WISC-R.  The  sub- 
jects were  3239  elementary  school  students.  A  multiple 
correlation  of  r=.337  was  significant  at  <.05;  however, 
only  11%  of  the  variance  in  WISC-R  scores  was  explained  by 
the  Gifted  Screener.  No  predictor  variable  contributed  to 
a  significant  multiple  correlation  coefficient  when  Full 
Scale  IQ  or  any  combination  of  WISC-R  subtests  was  used  as 
the  criterion  variable.  Stenson  concluded  that  the  Gifted 
Screener  should  not  be  used  to  predict  WISC-R  IQ  for 
gifted  program  prospects. 

In  1985,  Clarizio  and  Mehrens  evaluated  the  technical 
data  manuals  for  the  SOI  to  determine  the  test's  value  as 
a  screening  test  for  gifted  intelligence.  It  was  con- 
cluded that  "the  SOI  model  has  severe  psychometric  limita- 
tions" (p.  119).  Those  limitations  center  around  poor 
reliability,  inadequate  normative  data,  and  poor  external 
validity  for  many  of  the  factors  measured  by  the  test. 

Achievement  Tests 

In  research  introduced  in  Chapter  I  it  was  suggested 
that  achievement  tests  (CTBS)  and  cognitive  ability  tests 
(TCS,  WISC-R,  S-B)  measure  much  the  same  construct.  In 
the  supportive  literature  were  indications  that  distinc- 
tions between  achievement  tests  and  cognitive  ability 
tests  are  often  unclear.  Correlational  and  factor 
analytic  studies  lend  credence  to  this  contention. 
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Lennon  (1978)  has  found  that  relationships  between 
intelligence  and  achievement  tests  are  so  strong  as  to 
lead  to  the  criticism  that  the  two  types  of  tests  do  not 
measure  anything  different.  Both  tests  measure  what  the 
student  has  learned  (Gronlund,  1976)  and  both  tests  pre- 
dict future  learning  with  similar  degrees  of  success.  IQ 
tests  and  achievement  tests  differ  in  form  but  not  neces- 
sarily in  content  (Mercer,  1979). 

In  a  series  of  studies  in  the  late  1970s  and  1980s 
WISC-K  IQs  were  correlated  with  achievement  subtests  of 
the  Wide  Range  Achievement  Test  (WRAT).  Consistently  high 
correlations  were  found.  Some  of  these  early  studies  are 
summarized  in  Table  1-1. 

Also,  in  1978,  Stedman,  Lawlis,  Cortner,  and  Achten- 
berg  attempted  to  relate  Kaufman's  (1975)  factor  scores  to 
WRAT  attainment  in  a  population  of  76  children,  ages  6  to 
13.  Correlations  were  found  to  be  positive  and  signif- 
icant. 

Yule,  Gold,  and  Busch  (1981)  administered  the  WISC-R 
and  a  battery  of  achievement  tests  to  students  at  age  16 
1/2.  Achievement  measures  included  tests  of  "sentence 
reading,"  spelling,  and  arithmetic.  WISC-R,  Verbal  IQ 
shared  50%  of  the  variance  in  reading,  spelling,  and 
arithmetic.  Correlations  between  Full  Scale  IQ  and 
achievement  were  as  high  as  r=.80. 
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Table  2-1.   Summary  of  Relationship  between 
WISC-R  and  WRAT. 


WRAT 


WISC-R  Reading    Spelling    Arithmetic 


Brooks  (1977)  N=30;  6-10  years 

V..S.  IQ  64a  55  74 

P.S.  IQ  71  70  71 

F.S.  IQ  70  65  76 

Hartlage  and  Steele  (1977)  N=36; 
Mean  age  =  7  yrs  9  months 

V.S.  IQ  75  35  76 

P.S.  IQ  54  33  67 

F.S.  IQ  68  35  76 

Schwarting  and  Schwarting  (1977)  N=282;  6-16  years 

(a)  6-11  yrs 

V.S.  IQ  68  61  69 

P.S.  IQ  63  60  69 

F.S.  IQ  72  65  75 

(b)  12-16  yrs 

V.S.  IQ  74  69  66 

P.S.  IQ  40  34  55 

F.S.  IQ  62  56  66 

Hale  (1978)  N=155;  6-16  years 

V.S.  IQ  54  49  64 

P.S.  IQ  29  26  44 

Full  Scale  correlations  not  quoted. 


aDecimal  points  omitted. 

In  1982  a  follow-up  to  the  studies  summarized  in 
Table  2-1  was  conducted  by  Grossman  and  Johnson.  In  their 
study,  77  students  ages  6  to  16  were  administered  the 
WISC-R  and  the  WRAT.  Factor  scores  were  computed  on  two 
of  Kaufman's  (1975)  factors  (Verbal  Comprehension  and 
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Freedom  from  Distractibility)  and  WRAT  subtests.  A  multi- 
ple regression  analysis  was  computed  wherein  WISC-R  factor 
scores  served  as  conjoint  predictors  and  the  WRAT  standard 
scores  were  employed  as  criterion  variables.  Results  in- 
dicated a  significant  overall  prediction  of  WRAT  reading, 
spelling  and  arithmetic  by  the  two  WISC-R  factors. 

Wright  and  Dappan  (1982)  assessed  250  students  with  a 
mean  age  of  nine  years  on  the  WISC-R  and  WRAT.  Factor 
analysis  showed  a  common  factor  for  subtests  on  both  mea- 
sures. Correlations  between  subtests  from  the  two  tests 
were  as  high  as  r=.60  (on  WISC-R,  Arithmetic  and  WRAT, 
Arithmetic).  Some  other  subtests  correlated  at  .40  to 
.50. 

Literature  concerning  the  overlap  of  individual  tests 
of  intelligence  and  tests  of  achievement  include  studies 
in  which  the  WISC-R  and  the  Peabody  Individual  Achievement 
Test  (PIAT)  (Dunn  &  Markwardt,  1970)  were  examined. 
Wikoff  (1978)  factor  analyzed  the  WISC-R  along  with  the 
PIAT  for  180  referred  children.  Although  the  PIAT  General 
Information  and  Mathematics  subtests  loaded  on  factors 
previously  identified  in  the  structure  of  the  WISC-R,  the 
remaining  subtests  loaded  on  a  separate  factor,  subse- 
quently labeled  Word  Recognition.  The  results  supported 
the  use  of  both  instruments  as  sources  of  mutual  but 
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supplementary  information   in  the  assessment  of   learning 
problems . 

Dean  (1977)  assessed  the  degree  of  redundancy  between 
the  WISC-R  and  the  PIAT  using  a  canonical  correlation 
analysis  with  scores  from  205  referred  children.  The  re- 
sults indicated  that  65%  of  the  functions  of  the  PIAT 
overlapped  with  the  WISC-R  and  that  37%  of  the  functions 
of  the  WISC-R  overlapped  with  the  PIAT.  The  overlap  was 
attributed  to  common  verbal-educational  content.  Dean 
(1982)  found  a  similar  asymmetrical  overlap  between  these 
measures  in  samples  of  100  Anglo  and  100  Mexican-American 
children.  As  in  wikoll  s  factor  analysis,  bothoi  Dean  s 
analyses  showed  the  PIAT  subtests  of  reading  and  spelling 
to  offer  the  greatest  degree  of  information  not  redundant 
with   the   WISC-R. 

Brock  (1982),  finding  that  a  paucity  of  research 
existed  for  factor  analytic  investigations  of  the  WISC-R 
in  combinations  with  individual  achievement  tests,  con- 
ducted such  a  study.  He  factor  analyzed  the  WISC-R,  WRAT, 
and  PIAT  for  183  male  students  in  grades  3  through  6.  An 
attempt  was  made  to  determine  the  traits  or  common  skills 
measured  by  IQ  and  achievement  tests  when  viewed  concom- 
itantly. Four  factors  emerged.  One,  a  numerical  factor, 
was      comprised      of      subtests      from      all      three      tests. 
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Moderately  high  correlations  r  =  .40  to  .50  were  found  be- 
tween some  of  the  other  IQ  and  achievement  subtests. 

Stewart  and  Morris  (1977)  factor  analyzed  the  WISC, 
WAIS,  WRAT,  and  CAT  (California  Achievement  Test)  for  182 
students  ranging  in  age  from  11  to  18.  A  "substantial" 
overlap  of  verbal  intelligence  and  academic  achievement 
was  found.'  Resulting  factors  conformed  reasonably  well  to 
those  of  Kaufman  (1975).  Subtests  from  each  measure  were 
found  to  load  on  each  of  the  IQ  factors. 

In  a  study  wherein  the  abilities  underlying  reading 
readiness  were  identified,  Olsen  and  Rosen  (1971)  factor 
analyzed  three  group  reading  tests  and  the  WISC.  Subjects 
consisted  of  218  first  graders.  The  35  subtests  were  cor- 
related and  the  resulting  matrices  subjected  to  a  prin- 
cipal component  analysis.  Four  common  factors  were  re- 
vealed. In  one  factor,  reading  comprehension  loaded  with 
four  WISC-R  subtests.  In  another,  "writing  letters"  cor- 
related highly  with  WISC-R  Vocabulary.  In  a  third  factor, 
sentence  writing  and  WISC-R  Coding  were  included. 

There  have  emerged  two  camps  of  thought  on  the  issue 
of  reading  skill  acquisition  and  intelligence.  On  one 
hand,  in  some  research  it  has  been  suggested  that  reading 
is  a  function  of  information  processing  or  encoding  skills 
as  opposed  to  being  a  primarily  intellectual  function.  On 
the  other  hand,  in  similarly  focused  factor  analytic 
investigations  reading  has  been  found  to  be  highly  loaded 
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on  a  general  intelligence  factor  and  a  good  predictor  of 
intelligence . 

Researchers  whose  views  represent  the  latter  view- 
point have  supported  the  notion  that  reading  ability  is 
highly  correlated  with  general  intelligence  and  is  a  good 
predictor  of  intelligence.  Jensen  (1981)  reported  a  cor- 
relation of  r  =  .68  between  reading  comprehension  and  Full 
Scale  IQ  for  a  large  sample  of  students.  Other  research- 
ers have  cited  similarly  high  (.60-. 70)  correlations  be- 
tween reading  and  IQ  for  various  samples  of  students  in 
grades  K  through  12  (Brooks,  1977;  Hale,  1978;  Hartlage  & 
Steele,  1977;  Ryan,  1979;  Wikoff,  1978;  Yapp,  1977;  Yule, 
Gold,  &  Busch,  1981).  In  a  literature  review  of  34 
studies,  Hammill  and  McNutt  (1981)  found  a  median  correla- 
tion of  .75  between  measures  of  intelligence  and  achieve- 
ment. 

Reasoning  that  the  most  efficient  gifted  screening 
assessment  would  be  significantly  correlated  with  achieve- 
ment if  giftedness  is  defined  as  superior  school-related 
ability,  Grossman  and  Johnson  (1983)  investigated  the 
Stanford  Achievement  Test.  They  found  a  significant  cor- 
relation with  intelligence  among  46  children  with  SIT  IQs 
above  120. 

In  another  pertinent  study,  Schena  (1963)  found  that 
of  226  sixth  and  seventh  graders  who  scored  two  or  more 
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"levels"  above  the  norm  on  the  Metropolitan  Reading  Test, 
61%  scored  above  130  on  the  Stanford -Binet.  In  his  1984 
study,  Sternberg  found  that  IQ  accounted  for  as  much  as 
25%  of  the  variance  in  scholastic  performance. 

In  two  of  the  few  other  studies  in  which  achievement 
was  correlated  with  intelligence  among  superior  students, 
Mayfield  (1979)  had  573  third  graders  evaluated  in  terms 
of  intelligence,  achievement,  creativity,  and  teacher  per- 
ception of  IQ.  Results  yielded  significant  correlations 
between  intelligence  and  a  wide  range  of  achievement  do- 
mains among  the  student  sample.  Similarly,  Karnes, 
Edwards,  and  McCallum  (1986)  found  a  significant  correla- 
tion between  total  scores  on  the  California  Achievement 
Test  (CAT)  and  WISC-R  Full  Scale  IQs  of  41  gifted  children 
in  grades  four  through  six. 

Thus  the  results  of  this  body  of  literature  comparing 
intelligence  with  achievement  appear  conclusive,  as  in  a 
substantial  number  of  studies  it  is  suggested  that  the  two 
variables  are  fairly  highly  correlated. 

Mallinson  (1963)  attempted  to  uncover  a  relationship 
between  intelligence  and  achievement  in  science  and  math. 
The  SRA  achievement  series  and  the  SRA  Primary  Abilities 
Test  were  given  to  secondary  grade  students.  There  was  a 
resulting  correlation  of  r=.65  between  verbal  ability  and 
science  (facts  and  principles).   Verbal  ability  was  also 
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found  to  have  reasonably  high  correlations  with  factors  of 
arithmetic  achievement. 

In  another  study  in  which  factors  related  to  math 
performance  were  investigated,  Roach  (1979)  reported  a 
significant  correlation  (r=.80)  between  arithmetic 
achievement  and  verbal  IQ  in  third  graders. 

Correlation  coefficients  for  CTBS  and  TCS  subtests 
(CTB/McGraw-Hill,  1984)  were  calculated  using  2813  seventh 
graders.  Coefficients  between  .60  and  .72  were  not  uncom- 
mon. Correlations  between  the  TCS  Total  Score  and  CTBS 
subscales  of  Reading,  Language,  Math,  Social  Studies  and 
Science  were  .71,  .71,  .68,  .65,  .71  and  .65,  respec- 
tively. The  CTBS  and  TCS  Total  Batteries  correlated  at 
r  =  .75.  These  high  correlations  suggest  that  both  tests 
may  measure  similar,  though  operationally  distinguishable, 
constructs.  Support  for  the  contention  that  these  tests 
represent  similar  constructs  may  also  be  found  in  the  cor- 
relations of  TCS  subscale  scores  with  those  scores  of  its 
predecessor,  the  Short  Form  Test  of  Academic  Aptitude 
(SFTAA).  The  range  of  correlations  was  .55  to  .82,  not 
dissimilar  to  those  of  the  TCS  and  CTBS.  The  fact  that 
the  average  correlations  were  positive  means  that  the  sub- 
scales  must  be  measuring  something  in  common  (Jensen, 
1980) . 

In  summary,   there  is  substantial  evidence  that 
achievement  test  scores  and  IQ  test  scores  are  highly 
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correlated.  The  research  also  provides  reason  to  believe 
that  a  common  lactor  underlies  performance  on  both  types 
of  tests. 

Summary 

In  the  literature  relevant  to  the  accuracy  of  various 
procedures  for  screening  intelligence  among  gifted  stu- 
dents, there  have  generally  been  mixed  results.  Some 
encouraging  findings  have  occurred  on  studies  of  achieve- 
ment ratings  and  short-form  IQ  tests.  Group  IQ  tests  have 
tended  to  underestimate  the  IQs  of  some  gifted  children, 
though  they  generally  predicted  gifted  IQ  with  moderate 
accuracy.  Much  less  effective  moans  for  predicting  gifted 
IQ  are  IQ  screening  tests. 

The  preceding  literature  has  focused  on  the  problems 
of  screening  gifted  intelligence  among  the  school  age  pop- 
ulations. Those  procedures  that  have  shown  some  success 
have  been  inconsistent  in  their  findings.  Nearly  all  have 
proved  inefficient  in  terms  of  time  and  cost. 

Instruments  Used  in  Study 

Wechsler  Intelligence  Scale  for  Children-Revised  (WISC-R) 

The  WISC-R  has  been  the  most  widely  administered  test 
of  children's  intelligence  (Bryan  &  Bryan,  1975;  Grossman 


40 

&  Galvin,  1987;  Salvia  &  Ysseldyke,  19  7  8;  Vandiver  &  Van- 
diver,  1979).  In  much  of  the  research  surrounding  this 
instrument  it  is  suggested  that  it  merits  this  distinc- 
tion. Friedes  (1978)  described  the  standardization  of  the 
WISC-R  as  "state  of  the  art"  (p.  232)  and  as  meriting 
"blue  ribbons."  In  addition,  he  noted  as  praiseworthy  the 
high  correlation  coefficients  between  the  WISC-R  and  the 
Stanf ord-Binet. 

Reliability  coefficients  of  internal  consistency  for 
the  WISC-R  Verbal,  Performance,  and  Full  Scale  IQ  scores 
reported  in  the  test  manual  were  obtained  by  utilizing  a 
formula  for  computing  reliability  of  a  composite  group  of 
tests  (Wechsler,  1974).  The  average  reliability  coeffi- 
cients across  the  range  of  age  levels  were  V,  r  =  .9"4;  P, 
r=.90;  and  FS,  r=.96.  The  coefficients  of  individual  sub- 
tests based  on  split-half  or  test-retest  methods  ranged 
from  r=.77  to  .86.  Test-retest  correlations  for  the  Ver- 
bal, Performance,  and  Full  Scale  IQs  ranged  from  r=.90  to 
.95  based  on  a  3-month  interval  between  tests. 

Factor  analytic  research  by  Kaufman  (1979)  has  shown 
that  factors  corresponding  closely  with  the  Verbal  and 
Performance  Scales  of  the  WISC-R  exist.  In  1980,  Karnes 
and  Brown  factor  analyzed  the  WISC-R  on  946  gifted  stu- 
dents ages  6.0  to  16.0.  The  resulting  factors  were 
consistent  with  those  found  by  Kaufman  on  the  normal  popu- 
lation.  Most  verbal  scale  subtests  had  factor  loadings  in 
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Perceptual  Organization.   These  studies  strongly  support 
the  validity  of  the  WISC-R. 

Slosson  Intelligence  Test 

The  SIT  is  an  IQ  test  for  children  and  adults  de- 
signed for  use  by  either  relatively  untrained  examiners  or 
qualified  professionals.  The  SIT  typically  takes  between 
10  and  30  minutes  to  administer. 

New  norms  (1982)  represent  a  significant  departure 
from  procedures  previously  employed  (1961)  for  calculating 
an  IQ  score.  In  norming  the  SIT,  the  Stanf ord-Binet  was 
used  as  the  anchor  test.  Consistent  with  the  1974  revi- 
sion of  the  Stanf ord-Binet ,  ratio  IQs  were  abandoned  in 
favor  of  deviation  IQs.  Frequency  distributions  were  cal- 
culated for  each  of  the  19  chronological  age  ranges  on 
both  IQ  scales.  Then,  utilizing  a  "modified  table  look-up 
approach,"  appropriate  IQs  from  the  Stanf ord-Binet  were 
entered  on  the  developing  SIT  tables.  The  mean  IQ  for  the 
SIT  is  100  and  the  standard  deviation  is  16. 

In  the  SIT  manual,  the  authors  present  evidence  to 
persuade  the  reader  that  the  revised  SIT  IQs  are  equiva- 
lent to  Stanf  ord-Binet  IQs.  This  task  is  undertaken,  in 
large  part,  by  comparing  previous  (1961),  less  positively 
correlated  coefficients  to  newer  data, 

Slosson  and  Jensen  (1982)  stated  that  "the  SIT  is  as 
accurate  as  the  Stanf ord-Binet  in  measuring  a  person's 
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intelligence  when  both  instruments  have  been  properly 
administered"  (p.  16)  and  further  proposed  that  the  SIT 
qualifies  as  an  alternate  form  "of  the  Stanf ord-Binet  be- 
cause the  two  tests  possess  equivalent  means  and  standard 
deviations"  (p.  16).  Based  on  their  dubious  assumption  of 
test  equivalency  between  the  revised  SIT  and  the  S-B,  the 
authors  employed  the  Mean  Absolute  IQ  Difference  (MAD) 
statistic  to  determine  alternate  form  test  reliability  and 
standard  errors  of  measurement.  The  MAD  procedure,  which 
is  meant  to  be  used  only  with  equivalent  forms  of  a  test, 
yields  a  statistic  which  is  approximately  equal  to  the 
standard  deviation  times  8862.  The  authors  did  not  indi- 
cate the  relative  effects  on  the  reliability  and  the  SEMs 
when  the  measures  compared  do  not  strictly  meet  the  crite- 
ria of  alternate  forms,  as  is  apparent  in  this  case.  Nor 
are  there  attempts  to  evaluate  other  kinds  of  test  relia- 
bility. It  might  be  concluded,  therefore,  that  the 
authors'  claim  that  "the  SIT's  reliability  may  be  regarded 
as  not  less  than  .95"  (p.  136)  should  be  interpreted  with 
caution . 

Another  claim  made  in  the  SIT  manual  is  that  the  mean 
difference  between  the  Stanf ord -Binet  and  Slosson  IQ 
scores  is  less  than  one  point,  based  on  the  sample  of 
1,109.  The  procedure  by  which  this  statistic  was  obtained 
entails  computing  the  means  for  IQ  differences  between  the 
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two  tests  Cor  three  IQ  levels  across  four  age  groups:  be- 
low 84,  84-116,  and  above  116.  For  example,  at  age  13-6 
and  above,  mean  differences  between  the  Stanf ord-Binet  and 
SIT  are  -1.41,  -1.10,  and  2.62  at  the  three  IQ  levels.  In 
calculating  the  mean  difference,  negative  means  are  added 
to  positive  means  resulting  in  a  misleadingly  low  overall 
mean  difference.  In  this  example  the  total  mean  differ- 
ence for  age  13-7  and  above  is  -.67.  However,  if  individ- 
ual means  had  been  summed  in  terms  of  nondirectional  devi- 
ation from  zero,  the  mean  difference  would  have  been 
approximately  1.7.  With  regard  to  the  mean  difference  for 
the  entire  sample,  when  the  nondirectional  procedure  is 
used  the  difference  changes  from  -.04  to  approximately 
1.4.  The  mean  scores  are  rendered  even  more  difficult  to 
interpret  because  no  standard  errors  for  the  means  are 
reported. 

In  spite  of  the  apparent  inconsistencies  in  the  new 
SIT  manual,  the  revisions,  particularly  in  its  renorming, 
represent  considerable  improvement  in  the  test's  validity 
and  reliability.  These  test  improvements,  along  with  the 
ease  of  its  administering  and  scoring,  render  the  SIT  a 
test  of  considerable  utility  as  an  intellectual  screening 
procedure . 
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Comprehensive  Test  of  Basic  Skills  (CTBS)/Test  of 
Cognitive  Skills  (TCS] 

The  CTBS  (CTB/McGraw-Hill,  1984)  and  TCS  (CTB/McGraw- 
Hill,  1984)  are  the  tests  from  which  NP  items  were  taken. 
Psychometrically,  these  tests  were  well  suited  to  this  re- 
search. The  appropriateness  of  the  CTBS  and  TCS  for  this 
research  are  supported  by  several  of  their  attributes, 
some  of  which  were  discussed  in  Chapter  I. 

Items  were  chosen  for  both  tests  according  to  item 
response  theory  (IRT)  utilizing  a  three-parameter  logistic 
model.  The  items  were  chosen  according  to  their  ability 
to  (a)  discriminate  high  ability  traits  from  low  ability 
traits,  (b)  discriminate  high  ability  students  and  low 
ability  students  by  matching  item  difficulty  with  student 
total  score,  and  (c)  account  for  guessing  as  an  influence 
on  score  difficulty. 

In  terms  of  content  validity,  the  CTBS  is  designed  to 
measure  understanding  of  a  broad  range  of  concepts  as 
developed  by  various  educational  curricula.  Test  perfor- 
mance reflects  a  student's  skills  in  effective  use  of 
information  explicit  in  categories  derived  from  Bloom's 
taxonomy  (Bloom,  1956).  Item  development  specifications 
were  designed  to  ensure  comprehensive  coverage  of  the  con- 
tent and  process  categories. 

The  TCS  is  designed  to  measure  an  aptitude  construct 
that  can  be  operationally  distinguished  from  the 
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achievement  construct  of  the  CTBS,  based  on  research  con- 
ducted at  McGraw-Hill  by  Buchet  (1974,  cited  in 
TCB/McGraw-Hill ,  1984).  Empirical  criteria  for  distin- 
guishing between  aptitude  and  achievement  measures  were 
derived  by  the  publishers. 

Product  moment  correlations  between  the  four  subtests 
of  the  TCS  were  between  r=.41  to  r  =  .65.  Coefficients  be- 
tween subtests  and  total  score  ranged  from  r=.72  to  r=.85. 
Therefore,  it  was  suggested  that  all  subtests  measure  gen- 
eral intelligence  but  also  measure  independent  factors.  A 
correlation  coefficient  between  the  CTBS  and  TCS  of  r=.78 
was  calculated  on  a  sample  of  seventh  graders. 

Another  attribute  of  the  CTBS  and  TCS  is  the  compre- 
hensive sampling  and  norming  standards  applied.  The  norm- 
ing  samples  contained  approximately  250,000  students  in 
grades  K-12  from  public,  Catholic,  and  other  private 
schools  (CTB/McGraw-Hill,  1984).  School  districts  were 
randomly  chosen  from  four  geographic  regions.  Comprehen- 
sive norming  and  standardization  information  is  available 
in  the  CTBS  and  TCS  Technical  Reports. 

Internal  reliability  coefficients  were  calculated 
according  to  the  Kuder-Richardson  formula  20.  CTBS  relia- 
bility coefficients  ranged  from  .30  to  .96  on  the  10  sub- 
tests (CTB/McGraw-Hill,  1984).  All  subtests  except  Spell- 
ing and  Reference  Skills  had  values  at  or  above  .90.  On 
the  four  TCS  subtests  reliability  coefficients  ranged  from 
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.80  to  .84.  The  TCS  Technical  Reports  provide  reliabil- 
ities on  SEMs  for  subtests  based  on  number  correct  at  each 
grade  level.  Composite  calculations  for  the  total  test 
are  not  provided.  Also  reported  are  bias  studies  and 
tables  indicating  how  test  biases  are  accounted  for  and 
controlled. 

In  summary,  the  CTBS  and  TCS  were  well  suited  for 
this  study  because  of  the  sophisticated  method  utilized  in 
analyzing  items  and  the  tests'  high  validity  and  reliabil- 
ity. Further,  both  tests  employed  sampling  procedures 
designed  to  provide  norms  for  the  entire  U.S.  school  popu- 
lation. Research  has  also  been  conducted  to  aid  in  reduc- 
ing test  bias  for  the  CTBS  and  TCS. 

Stanford-Binet  Intelligence  Scale  (S-B) 

The  third  revision  of  the  Stanford-Binet  (S-B),  pub- 
lished in  1960,  remained  unchanged  in  content  and  format 
through  1985.  A  revised  version  of  the  S-B  was  published 
in  1986.  The  1960  version  was  constructed  by  combining 
forms  L  and  M  of  the  1937  scale  and  eliminating  those 
items  considered  obsolescent  and  by  relocating  items  whose 
difficulty  level  had  altered  during  the  intervening  years. 
The  test  was,  however,  restandardized  in  1972.  New  norms 
were  derived  from  a  sample  of  approximately  2,100  cases 
during  the  1971-72  school  year.  Children  in  the  1972  norm 
group  were  chosen  from  20,000  school  age  children  in 
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grades  3  through  12  who  were  identified  based  on  scores 
from  the  Cognitive  Abilities  Test.  The  distribution  of 
scores  in  this  subsample  corresponded  to  the  national  dis- 
tribution of  the  entire  sample.  The  1972  norms  were  be- 
lieved to  be  based  on  a  more  representative  sample  than 
previous  norms  (Terman  &  Merrill,  1973). 

The  reliability  of  the  1937  Stanf ord-Binet  was  deter- 
mined by  correlating  IQs  on  forms  L  and  M  administered  to 
the  standardization  group  within  an  interval  of  one  week 
or  less.  Such  reliability  coefficients  are  thus  measures 
of  both  short  term  temporal  stability  and  equivalence 
across  the  two  item  samples.  In  general,  the  Stanford- 
Binet  tends  to  be  more  reliable  for  older  than  for  younger 
age  groups,  and  for  lower  than  for  higher  IQs  (Anastasi, 
1976).  Reliability  coefficients  range  from  .83  to  .98. 
The  Stanf ord-Binet  is  considered  a  highly  reliable  test 
with  most  coefficients  for  the  various  age  and  IQ  levels 
being  over  .90. 

Validity  ratings  for  the  Stanf ord-Binet  were  obtained 
from  examination  of  the  test  content,  from  factor 
analysis,  and  from  correlations  with  achievement  ratings. 
An  examination  of  the  Stanf ord-Binet  tasks  indicates 
assessment  of  a  wide  range  of  reasoning  abilities.  These 
include  tasks  requiring  hand-eye  coordination,  perceptual 
discrimination,   arithmetic  reasoning,   and  verbal 
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reasoning.   The  most  common  type  of  test,  especially  of 
the  upper  age  levels,  is  that  employing  verbal  content. 

Data  on  criterion-related  validity  of  the  Stanford- 
Binet  have  been  obtained  chiefly  in  terms  of  academic 
achievement  (Anastasi,  1976).  Correlations  between  the 
scale  and  school  grades,  teachers'  ratings,  and  achieve- 
ment test  scores  generally  fall  between  .45  and  .75.  The 
Stanford-Binet  tends  to  correlate  highly  with  performance 
in  nearly  all  academic  courses,  but  predominantly  with 
verbal  courses  such  as  English  and  history.  Correlations 
with  achievement  test  scores  show  the  same  pattern.  The 
rigorous  standardization  and  ronorming  of  tho  Stanford 
Binet,  along  with  its  high  validity  and  reliability,  indi- 
cate that  it  was  an  appropriate  IQ  test  for  this  study. 


CHAPTER  III 
METHODOLOGY 


Overview 


In  this  research  study  a  procedure  was  investigated 
for  analyzing  a  comprehensive,  group-administered  achieve- 
ment and  cognitive  abilities  test  to  determine  whether  an 
item  set  can  be  derived  that  discriminates  gifted  from 
high-achieving,  but  not-gifted,  seventh  graders.  when 
such  an  item  set  was  derived,  it  was  compared  to  a  com- 
monly used  IQ  screening  test  to  assess  the  relative  accu- 
racies of  each  procedure  in  discriminating  gifted  from 
not-gifted  students  in  a  second  seventh  grade  population 
sample. 

This  chapter  is  organized  into  the  following  sec- 
tions: (a)  Population  and  Sample,  (b)  Assessment  Instru- 
ments, (c)  Research  Procedures,  (d)  Data  Analysis,  and  (e) 
Methodological  Limitations. 

Population  and  Sample 

The  research  sample  of  179  students  was  drawn  from  a 
population  of  seventh  graders  who  had  been  tested  on  the 
WISC-R  or  S-B  for  the  "gifted  program"  in  the  Hillsborough 
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County  (Florida)  Public  School  District.  Sampling  was 
conducted  at  the  end  of  the  1984-85  school  year.  Nearly 
all  students  had  previously  been  administered  the  SIT.  In 
most  instances,  only  those  students  who  scored  two  stan- 
dard deviations  or  more  above  the  mean  had  been  adminis- 
tered the  WISC-R  or  Stanford-Bine t.  All  students  in  the 
population  also  had  current  CTBS/TCS  scores  on  file.  Some 
students  in  this  population  had  met  intellectual  eligibil- 
ity guidelines  for  the  gifted  program  (on  WISC-R  or  s-B 
criteria)  and  some  had  not.  All  students  were  tested  by 
school  psychologists  during  each  of  the  three  school  years 
under  investigation. 

Simple  random  sampling  was  conducted  by  the  research- 
er at  the  Hillsborough  County  School  Board  office  in  June 
of  1986.  Names  of  the  seventh  graders  who  were  tested  for 
the  gifted  education  program  from  September  of  1982  and 
June  of  1985  were  obtained  from  computer  printouts  con- 
taining data  for  all  students  in  the  district  who  had  been 
tested  by  school  psychologists.  Students  in  the  sample 
pool  were  assigned  a  number,  and  numbers  were  selected 
according  to  a  random  number  table.  Numbers  were  then 
recorded  and  returned  to  the  pool  to  ensure  an  equal 
chance  of  selection  for  the  remaining  numbers.  After  61 
students  were  randomly  selected  for  Phase  II,  the  remain- 
ing 118  students  were  assigned  to  Phase  I. 
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Assessment  Procedures 

As  previously  discussed,  the  five  assessment  instru- 
ments used  for  this  research  were  the  Wechsler 
Intelligence  Scale  for  Children-Revised  (WISC-R),  The 
Stanford-Binet  Intelligence  Scale  (S-B),  the  Slosson  In- 
telligence Test  (SIT),  the  Comprehensive  Test  of  Basic 
Skills  (CTBS)/  and  Test  of  Cognitive  Skills  (TCS). 

Administration,  scoring,  and  interpretation  of  the 
WISC-R  and  Stanford-Binet  were  conducted  by  state  certi- 
fied school  psychologists  prior  to  the  onset  of  this 
study.  All  tests  were  individually  administered  and  hand 
scored  using  current  norms.  The  WISC-R  yields  a  Verbal 
Scale  IQ  (representing  verbal  reasoning  abilities),  a  Per- 
formance Scale  IQ  (representing  perceptual  organization 
and  nonverbal  reasoning),  and  a  Full  Scale  IQ.  Only  the 
Full  Scale  score,  which  represents  total  IQ,  was  used  as  a 
measure  of  gifted  intelligence.  This  procedure  conformed 
to  school  district  guidelines.  The  Stanford-Binet  yields 
a  total  IQ  score  only.  A  cutoff  score  of  127  was  used  as 
the  gifted  cutoff  in  the  district.  Students  attaining  a 
Full  Scale  IQ  of  127  or  greater  were  considered  to  have 
met  the  intellectual  criterion  for  gifted  program  elig- 
ibility. The  IQ  cutoff  was  chosen  by  the  district  as  a 
score  that  is  two  standard  deviations  above  the  test  mean 
(WISC-R  IQ=130),  minus  one  standard  error  of  measurement 
(three  IQ  points)  (S-B  IQ  =  132  minus  5  IQ  points).   The 
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Slosson  Intelligence  Test  was  administered  to  students  as 
an  IQ  screening  procedure  by  school  guidance  counselors  or 
curriculum  specialists  who  typically  had  little  formal 
training  in  administration  of  individual  intelligence 
tests.  SITs  were  given  to  students  within  one  year  prior 
to  WISC-R  testing.  SIT  protocols  were  hand  scored  by  the 
test  administrators.  Under  usual  circumstances,  a  total 
IQ  of  135  was  used  as  a  screening  cutoff.  Children  who 
scored  at  or  above  this  cutoff  point  were  normally  re- 
ferred to  the  school  psychologist  for  WISC-R  evaluation. 
There  were  some  exceptions  to  this  rule  because  occasion- 
ally students  who  did  not  score  at  or  above  the  cutoff 
were  referred.  Generally,  these  students  exhibited  ex- 
tremely high  academic  skills  or  other  competencies  that 
compelled  school  personnel  to  refer  them  for  formal  test- 
ing. The  SIT  yields  a  total  IQ  score.  The  135  cutoff 
score  is  two  standard  deviations  above  the  test  mean. 

The  CTBS  and  TCS  were  administered  to  students  by 
classroom  teachers  in  group  format  according  to  standard- 
ization procedures  found  in  the  teacher's  manual.  The 
CTBS  and  TCS  were  designed  to  be  easily  administered 
(Ahmann,  1972),  and  teachers  have  received  little  formal 
training  in  their  administration.  The  seventh-grade  level 
of  the  CTBS  (Level  H)  yields  subscale  scoring  in  Reading 
(two  sections),  Spelling,  Language  (two  sections), 
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Mathematics  (two  sections),  Reference  Skills,  Science,  and 
Social  Studies.  The  Science  and  Social  Studies  subscales, 
consisting  of  86  items,  were  not  administered  in  all 
schools  of  the  population  school  district;  however,  these 
data  were  included  in  this  research.  In  addition  to  sub- 
scale  scores,  the  CTBS  yields  an  overall  achievement  index. 

The  TCS  is  comprised  of  four  cognitive  ability  sub- 
tests measuring  competencies  in  Sequencing,  Analogies, 
Memory,  and  Verbal  Reasoning.  Derived  scores  are  provided 
for  subtests  and  for  the  overall  profile.  Because  in  this 
study  responses  to  individual  test  items  were  analyzed, 
derived  subscale  and  total  scores  for  the  CTBS  scores  were 
not  utilized. 

Research  Procedures 

In  Phase  I  of  this  study  items  comprising  the  new 
procedure  (NP)  were  selected  from  the  CTBS  and  TCS  by  con- 
ducting item  analyses  of  the  performance  of  tho  118  stu- 
dents in  the  Phase  I  sample.  Two  sets  of  items  were  de- 
lineated that,  in  general  terms,  were  answered  correctly 
by  the  gifted  students  more  frequently  than  by  the  not- 
gifted  students. 

Gifted  cutoff  scores  on  the  obtained  NP  items  were 
computed  by  subtracting  fractions  or  multiples  of  standard 
deviations  from  the  mean  NP  score  of  the  gifted  group  to 
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determine  which  cutoff  point(s)  most  accurately  predicted 
giftedness.   Newly  referred  students  in  the  school  setting 
who  scored  at  or  above  that  point  would  be  referred  for 
formal  IQ  testing.   By  adopting  a  cutoff  score  two  stan- 
dard deviations  below  the  gifted  mean,  approximately  97% 
of  intellectually  gifted  students  would  be  referred  for  IQ 
testing  based  on  research  sample  parameters.   A  cutoff 
score  incorporating  a  one  standard  deviation  below  the 
mean  cutoff  would  delete  approximately  16%  of  the  gifted 
students  from  testing  and  gifted  program  eligibility. 
However,  a  pitfall  of  including  as  broad  a  range  of  stu- 
dents as  permitted  by  the  two  standard  deviations  crite- 
rion was  that  a  relatively  large  number  of  not-gifted  stu- 
dents would  also  be  referred  for  formal  IQ  testing,-  thus 
reducing  the  accuracy  of  NP  predictions  of  giftedness. 
Adjustment  of  the  cutoff  point  was  desirable  to  ensure  an 
optimal  ratio  of  gifted  students  accurately  predicted  to 
not-gifted  students  inaccurately  predicted. 

After  items  for  the  NP  were  obtained  and  multiple 
cutoff  points  was  established,  Phase  II  was  begun.  in 
Phase  II,  test  scores  based  on  selected  NP  items  were  com- 
puted for  the  second  sample  of  students.  The  NP  was  an- 
alyzed in  terms  of  its  discrimination  between  the  gifted 
and  not-  gifted  students  on  WISC-R.  The  accuracy  of  clas- 
sification was  then  compared  to  accuracy  of  classification 
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obtained  using  students'  SIT  scores  as  a  screening 
procedure  in  predicting  gifted  IQ  on  the  WISC-R. 

Data  collectors  and  recorders  consisted  of  the  re- 
searcher and  employees  of  the  district  school  board. 
School  board  employees  working  in  the  testing  and  evalua- 
tion office  obtained  CTBS/TCS  item  responses  from  computer 
data.  WISC-R  and  SIT  scores  were  obtained  in  a  similar 
manner  by  the  researcher  and  research  assistant. 

Data  Analysis 


As  discussed  in  Chapter  I,  two  correlational  analyses 
were  conducted  in  Phase  I  of  this  study  to  determine  de- 
sirable items  for  the  NP.  Phi  coefficients,  which  are 
designed  to  correlate  two  dichotomous  variables,  were  com- 
puted. The  two  dichotomous  variables  correlated  were  the 
student's  item  response  (correct  or  incorrect)  and  student 
classification  (gifted  or  not-gifted)  on  the  WISC-R.  The 
criterion  for  deciding  if  an  item  was  to  be  included  was 
significance  at  the  .05  level.  For  the  present  sample, 
this  meant  that  any  item  with  a  phi  coefficient  greater 
than  .182,  was  selected  for  the  NP  screening  test. 

Phi  is  based  on  the  proportions  of  cases  passing  and 
failing  an  item  in  both  the  gifted  and  not-gifted  crite- 
rion groups.   The  phi  coefficient  is  known  to  be  biased 


56 


toward  middle  difficulty  levels  of  test  items.  As  previ- 
ously discussed,  the  research  (CTBS  and  TCS)  items  were 
designed  primarily  to  assess  performance  of  medium  diffi- 
culty. 

Test  items  were  also  analyzed  according  to  the  index 
of  discrimination  (Ebel,  1965).  The  difference  between 
the  percentage  of  gifted  students  and  not-gifted  students 
passing  each  item  provides  an  index  of  item  validity  that 
can  be  interpreted  independently  of  the  size  of  the  par- 
ticular sample  in  which  it  was  obtained  (Anastasi,  1976). 
The  index  of  discrimination  (D)  has  been  shown  to  measure 
item  validity  with  equivalent  accuracy  to  other  more  elab- 
orate measures  (Engelhart,  1965).  Similar  to  the  phi,  D 
values  are  biased  in  favor  of  items  with  intermediate  dif- 
ficulty levels.  A  coefficient  of  .20  or  greater  was  used 
as  the  criterion  for  selecting  an  item  for  the  NP.  Thus 
in  Phase  I,  two  forms  of  the  NP  screening  test  were 
created;  one  form,  here  designated  as  NP-phi,  was  based  on 
items  selected  using  phi  coefficients;  the  other  form,  NP- 
D,  was  based  on  items  selected  using  the  index  of  discrim- 
ination. 

For  Phase  II,  analyses  using  the  coefficient  Kappa 
were  conducted  to  test  (a)  whether  the  NP  or  SIT  was  more 
accurate  in  classifying  gifted  seventh  graders,  and  (b)  at 
what  cutoff  points  either  the  NP  or  SIT  was  more  accurate. 
The  Kappa  analysis  measured  the  proportion  of  correct  (and 


57 


incorrect)  classifications  for  NP-phi,  NP-D,  and  the  SIT 
while  adjusting  for  the  percentage  of  correct  classifica- 
tions that  could  be  expected  on  the  basis  of  chance  alone. 
Kappa  is  a  descriptive  statistic  and  not  a  test  of  statis- 
tical significance. 

Using  the  Kappa  statistic  proportions  of  predictions 
were  compared  at  various  cutoli  points.  Kappa  adjusts  for 
predictions  expected  by  chance  alone  by  taking  into 
account  both  observed  and  expected  proportion  classifica- 
tions (Cohen,  1960).  For  example,  when  considering  K  val- 
ues for  the  NP,  the  numerator  of  K  is  regarded  as  the  pro- 
portion of  students  consistently  classified  by  both  the  NP 
and  the  WISC-R  (observed)  over  and  above  the  product  of 
the  proportions  of  students  classified  individually  by 
each  test  (expected).  The  denominator  of  K  is  the  maximum 
possible  increase  in  decision-consistency  above  chance 
level,  given  the  proportions  classified  by  the  two  tests 
independently. 

The  formula  used  to  compute  Kappa  was 


K  =  P  -  PC 
1  "  Pc 


where  P  =  proportion  of  consistent  gifted  and  not-gifted 
classifications  for  WISC-R  and  NP  or  WISC-R  and 
SIT 


and  Pc  =  proportion  of  gifted  classifications  for  WISC-R 
x  proportion  of  gifted  classifications  for  NP 
(or  SIT)  +  proportion  of  not-gifted  classifica- 
tions for  WISC-R  x  proportion  of  not-gifted 
classifications  for  NP  or  SIT. 

Methodological  Limitations 

Possibly  the  most  severe  methodological  limitation  of 
this  study  concerns  the  appropriateness  of  the  CTBS  and 
TCS  for  discriminating  students  who  score  in  a  restricted 
range  near  the  ceiling  of  the  test.  The  CTBS/TCS  tests 
were  deemed  as  appropriate  for  this  research  because  the 
data  were  readily  available  and,  if  usable,  would  preclude 
students  from  taking  a  gifted  screening  test.  Also,  the 
CTBS  and  TCS  measure  the  wide  range  of  skills  and  abil- 
ities. However,  because  the  CTBS  and  TCS  were  designed  to 
measure  traits  of  the  general  population  of  students,  they 
were  less  sensitive  to  group  differences  in  the  extreme 
ranges  of  ability  and  particularly  at  the  ceiling  level. 
The  location  parameter  indicated  the  ability  level,  in 
scale  score  units,  at  which  the  item  was  most  sensitive  to 
individual  differences.  Thus,  test  items  were  designed  to 
have  their  greatest  sensitivity  to  individual  differences 
in  the  general  range  where  most  students  taking  the  test 
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would  score.  Since  on  most  of  the  items,  students  in  the 
research  sample  were  expected  to  score  well  above  the 
location  parameter,  most  of  the  items  provided  little  dif- 
ferentiation among  these  students  with  much  higher  scale 
scores.  However,  in  defense  of  these  instruments  for  use 
in  this  study,  some  items  with  extreme  location  parameters 
were  purposely  included  in  the  CTBS  by  its  constructors. 
These  items  were  well  above  or  below  the  range  of  perfor- 
mance for  which  the  test  was  designed.  Item  characteris- 
tic curves  indicate  the  existence  of  CTBS  items  that  were 
passed  typically  only  by  students  with  very  high  total 
scores  on  the  test  (CTB/McGraw-Hil 1 ,  1984).  IRT  item  lo- 
cation parameters  for  the  TCS  had  not  been  calculated, 
though  since  the  TCS  was  designed  to  predict  achievement 
for  the  general  population,  it  might  be  assumed  that  loca- 
tion parameter  criteria  would  adhere  to  a  similar  ration- 
ale. 

A  second  limitation  of  this  study  concerns  measure- 
ment error  due  to  the  variable  adherence  to  test  standard- 
ization by  administrators  of  the  CTBS,  TCS  and  the  SIT. 
Because  teachers  and  counselors  receive  varying  degrees  of 
formal  training  on  the  importance  of  precise  adherence  to 
test  standards  and  on  the  influence  of  standardization  de- 
viations on  test  reliability,  administrators'  strict  con- 
formity to  test  standards  was  questionable.  Because  very 
little  training  was  provided  for  the  CTBS/TCS  and  SIT, 
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misinterpretation  of  instructions  or  standards  was  pos- 
sible as  well. 

A  related  source  of  potential  measurement  error 
existed  due  to  the  group  format  by  which  the  CTBS/TCS  is 
administered.  As  opposed  to  the  WISC-R  and  SIT,  adminis- 
trators were  restricted  in  their  ability  to  closely  mon- 
itor individual  students  and  control  for  such  factors  as 
misinterpreted  directions  or  acute  physical  or  emotional 
liabilities  of  students.  Since  regular  CTBS/TCS  testing 
was  conducted  only  once  a  year  in  the  research  school  dis- 
trict, efforts  were  made  to  test  as  many  students  as  pos- 
sible during  that  time.  However,  SIT  and  WISC-R  testing 
were  more  easily  postponed  to  a  later  date  if  a  situation 
warranted  such  action. 

Another  limitation  of  the  CTBS  specific  to  this  study 
was  that  some  students  were  not  administered  the  Science 
and  Social  Studies  subtests  in  the  school  district,  be- 
cause they  were  optional  and  were  administered  only  in 
some  schools.  These  subtests  are  comprised  of  40  items 
each  and  may  contain  items  useful  for  discrimination  of 
gifted  and  not-gifted  students.  if  deleted  from  analysis, 
these  items  may  detract  from  the  overall  accuracy  of  the 
proposed  new  procedure  in  accomplishing  its  intended  goal. 

The  administration  of  the  WISC-R  and  SIT  over  a 
3-year  period  suggests  a  question  regarding  score 


6  1 


equivalence.  One  might  argue  that  since  the  items  admin- 
istered to  a  9-year-old  and  an  11-year-old  are  different 
and  represent  different  test  difficulty  levels,  the  two 
students  are,  in  fact,  being  tested  on  different  scales. 
This  argument  implies  that  gifted  intelligence  for  the  9- 
year-old  is  not  equivalent  to  gifted  intelligence  for  the 
11-year-old.  'To  the  contrary,  while  imperfections  in  test 
stability  will  cause  some  fluctuation  in  IQ  over  time,  the 
adoption  of  the  deviation  IQ  (Wechsler,  1974)  permits  com- 
parison of  scores  over  age  levels. 


CHAPTER  IV 
RESULTS 


This  chapter  is  presented  in  a  format  that  sequen- 
tially reflects  the  methodological  progression  of  the 
study.  Phase  I  will  be  discussed  first  to  present  results 
of  two  analyses  used  to  select  items  for  the  new  screening 
procedure  (NP).  Next,  Phase  II  cross  validation  results 
are  presented  in  terms  of  prediction  accuracy  of  the  NP- 
phi,  NP-D,  and  SIT  in  relation  to  cutoff  scores. 

Phase  I  —  Item  Selection 

Phi  Coefficients 

The  phi  coefficient  analysis  of  the  460  CTBS  and  TCS 
items  yielded  56  items  that  discriminated  the  59  gifted 
and  59  not-gifted  seventh  graders.  This  number  of  items 
is  over  twice  as  many  as  would  be  expected  by  chance  at 
the  ^.05  level  of  confidence.  There  were  13  items  signif- 
icant at  the  £.01  level  and  three  significant  at  ^.001 
(Table  4-1).  As  also  shown  in  Table  4-1,  all  CTBS  and  TCS 
subtests  contributed  items  except  for  CTBS  Reference 
Skills  and  TCS  Sequences. 

Not  all  of  the  56  significant  phi  analyses  items  were 
retained  for  use  in  Phase  II.   m  five  cases  the  phi  items 
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Table 

4-1.   Item  Phi 
Levels 

Values  and  S 

ignif icance 

Item  No. 

Subtest 

Phi-Value 

Significance 

16 

CTBS-Voc. 

.2059 

£.05 

39 

Voc. 

.3258 

£.001 

40 

Voc. 

.1834 

£.05 

41 

Voc. 

.2069 

£.05 

49 

Read . 

.1963 

£.05 

53 

Read. 

.2172 

£.05 

61 

Read . 

.2266 

£.05 

65 

Read. 

.1977 

£.05 

77 

Read . 

.1963 

£.05 

80 

Read. 

.2502 

£.01 

83 

Read . 

.2386 

£.01 

85 

Read. 

.2502 

£.01 

109 

Spell . 

.2645 

£.01 

113 

Spell . 

.2285 

£.01 

120 

Spell . 

.2199 

£.01 

122 

Lang . 

.2652 

£.01 

127 

Lang  . 

.1842 

£.05 

129 

Lang. 

.2069 

£.05 

136 

Lang . 

.2187 

£.05 

137 

Lang . 

.2377 

£.01 

139 

Lang  . 

.2559 

£.01 

154 

Lang . 

.2035 

£.05 

159 

Lang  . 

.1842 

£.05 

171 

Lang . 

.1913 

£.05 

176 

Lang . 

.2934 

£.01 

177 

Lang . 

.2035 

£.05 

180 

Lang  . 

.2018 

£.05 

182 

Lang . 

.1842 

£.05 

187 

Lang  . 

.1913 

£.05 

193 

Lang . 

.1834 

£.05 

205 

Math 

.2187 

£.05 

215 

Math 

.3007 

£.01 

*216 

Math 

.1905 

£.05 

220 

Math 

.3245 

£.001 

230 

Math 

.3564 

£.001 

236 

Math 

.2652 

£.01 

264 

Math 

.2161 

£.05 

266 

Math 

.2331 

£.05 

270 

Math 

.2146 

£.05 

271 

Math 

.2806 

£.01 

273 

Math 

.2784 

£.01 

276 

Math 

.1885 

£.05 

277 

Math 

.2188 

£.05 

(2)311 

Sci  . 

.3214 

<-05 
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Table  4-1.   Continued. 


*(2)343  Soc.  St.  .2812  ^.05 

(2)364  Soc.  St.  .3076  ^.05 

(2)366  Soc.  St.  .3921  ^.01 

(2)378  Soc.  St.  .3076  ^.05 


* 


Subtest 

Soc. 

St. 

Soc . 

St. 

Soc . 

St. 

Soc . 

St. 

TCS-Anal . 

Mem. 

Mem . 

Mem. 

Mem. 

Mem. 

Verb 

.  Reas 

Verb 

.  Reas 

Item  No.         Subtest       Phi-Value       Significance 


407  TCS-Anal.  .1977  ^.05 

*421  Mem.  .2148  <.05 

425       '  Mem.  .1916  <.05 

429  Mem.  .2035  <.05 

4  34  Mem.  .1835  <.0  5 

*438  Mem.  .2068  ^.05 

440  Verb.  Reas.  .1858  ^.05 

457  Verb.  Reas.  .2168  ^.05 


*  Item  discriminates  in  favor  of  not-gifted, 
2  Answered  by  less  than  90%  of  sample. 
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discriminated  in  favor  of  the  not-gifted  students.   That 
is,  not-gifted  students  responded  correctly  to  the  items 
more  frequently  than  the  gifted  students.   Also  deleted 
from  Phase  II  analyses  were  five  items  responded  to  by 
only  about  half  of  the  Phase  I  sample.   These  five  items 
were  located  in  the  Science  and  Social  Studies  subtests, 
which  were  administered  on  an  optional  basis  at  the  dis- 
cretion of  the  various  schools  involved.   They  were  de- 
leted because  the  cutoff  scores  were  set  based  on  total 
items  administered  to  all  students.   Those  items  not  ad- 
ministered to  all  students  were  deleted  so  that  all  stu- 
dents could  potentially  attain  the  maximum  raw  score. 
Therefore,  47  phi  items  were  retained  for  use  in  Phase  II, 
42  from  the  CTBS  and  5  from  the  TCS. 

The  mean  of  the  total  scores  for  the  47  items  se- 
lected by  the  phi  analyses  was  39.37.  The  median  and  mode 
were  somewhat  higher,  41.00  and  46.00,  respectively.  The 
standard  deviation  was  7.06  and  the  range  was  41.  Most 
students  performed  well  on  these  items,  with  most  respond- 
ing correctly  to  nearly  all  of  them.  This  homogeneity  of 
scores  is  reflected  by  the  negative  skew  of  the  distribu- 
tion. Even  though  scores  appeared  to  concentrate  near  the 
upper  end  of  the  distribution,  the  large  range  of  scores 
(i.e.,  4  through  46)  contributed  to  a  standard  deviation 
of  adequate  size. 
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Index  of  Discrimination 

The  index  of  discrimination  analyses,  which  measured 
the  difference  between  the  percentage  of  gifted  and  not- 
gifted  students  passing  each  item,  yielded  a  subset  of  24 
items  with  D  values  of  .20  or  greater.  As  recommended  by 
Engelhart  (1965),  items  possessing  a  D  value  of  at  least 

20  are  considered  to  show  adequate  discrimination.  All 
24  of  the  acceptable  items  were  obtained  from  the  CTBS. 

In  Table  4-2  the  upper  and  lower  values  represent 
percentage  passing  each  item  for  the  gifted  and  not-gifted 
students,  respectively.   D  values  ranged  from  .221  through 

.393.  As  might  be  expected,  the  highest  percentage  of 
correct  responding  occurred  with  the  beginning  TCS  subtest 
items.  This  was  because  TCS  items  are  ordered  hierarch- 
ically by  subtest  according  to  difficulty  level. 

In  this  analysis,  also,  an  item  was  deleted  if  it 
discriminated  in  favor  of  the  not-gifted  group  or  if  it 
was  not  administered  to  over  10%  of  the  sample.  In  con- 
trast to  the  phi  analyses,  a  large  proportion  of  items 
(i.e.,  10  or  42%)  were  deleted.  Fourteen  items  (Table  4- 
2)  were  retained  from  the  CTBS. 

In  contrast  with  the  phi  data,  results  of  scores 
based  on  selected  items  from  the  D  analyses  revealed  a 
relatively  normal  distribution,  with  the  mean  of  10.02, 
median  of  10.02,  and  mode  of  11.00  falling  within  a  range 
of  one  test  item.   The  standard  deviation  was  2.62  and  the 
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Table  4-2.   Items  with  D  Values  >.20  and  Corres- 
ponding Subtests 


Item  No.       Subtest       Upper  x      Lower  x       D 


39  CTBS.  Voc.  .930  .667  .263 

80  Read.  .912  .712  .200 

85  Read.  .912  .712  .200 

109  Spell  .931  .731  .200 

158  Lang.  .958  .867  .208 

176  Lang.  .897  .650  .327 

185  Lang.  .879  .667  .212 

215  Math  .842  .567  .275 

230  Math  .873  .552  .323 

271  Math  .948  .746  .202 

273  Math  .947  .746  .201 

277  Math  .737  .525  .212 

(2)281  Math  .750  .538  .212 

(2)311  Sci.  1.000  .735  .265 

317  Sci.  .937  .727  .210 

(2)330  Sci.  .937  .727  .210 

(2)341  Soc.  St.  .875  .667  .208 

(2)343  Soc.  St.  .688  .909  -.221* 

351  Soc.  St.  .937  .727  .  .210 

(2)360  Soc.  St.  .937  .467  .240 

(2)364  Soc.  St.  1.000  .435  .242 

(2)366  Soc.  St.  .938  .506  .393 

(2)377  Soc.  St.  .937  .727  .210 

(2)378  Soc.  St.  1.000  .758  .242 


Item  discriminates  in  favor  of  not-gifted. 

2  answered  correct  by  less  than  90%  of  the  sample. 
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range  was  14.  Unlike  the  phi  distribution,  the  ceiling  of 
the  D  distribution  was  probably  sufficiently  high  because 
correct  responding  diminished  beyond  raw  scores  of  11. 
Only  11  of  117  students  received  a  raw  score  above  12, 
while  65  received  scores  within  one  point  of  the  mean. 
Taking  into  account  the  small  number  of  NP-D  items  (14), 
the  standard  deviation  of  2.6  is  considered  to  be  ade- 
quately large  for  calculating  cutoffs.  The  range  of 
scores  covered  both  extremes  of  the  distribution,  and  the 
skew  is  not  as  great  as  might  be  expected  given  the  gen- 
erally restricted  range  of  the  sample  at  the  upper  ability 
1 cvcl s . 

Consistent  with  previous  research  (e.g.,  Engelhart, 
1965),  there  were  commonalities  between  the  phi  and  D  an- 
alyses results  for  this  sample.  Sixteen  items,  or  67%  of 
the  D  and  9%  of  the  phi  items,  were  selected  from  both 
analyses.  All  common  items  were  from  the  CTBS  (on 
achievement  test)  because  the  D  analyses  yielded  no  TCS 
(cognitive  ability)  NP  items.  Similarly,  the  one  D  analy- 
sis item  found  to  discriminate  in  favor  of  the  not-gifted 
students  also  did  so  on  the  phi  analysis. 

Cutoff  Scores 

Cutoff  scores  for  the  phi,  D,  and  SIT  analyses  were 
computed  in  order  to  demarcate  optimal  cutting  points  for 
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differentiating  the  gifted  from  not-gifted  students.  Cut- 
off scores  were  established  such  that  a  maximum  number  of 
gifted  and  a  minimum  number  of  not-gifted  students  would 
fall  above  the  cutoff.  Therefore,  the  true  positive  and 
true  negative  findings  were  maximized  while  the  false  pos- 
itives and  false  negatives  were  minimized.  This  procedure 
required  a  "value  judgment"  by  the  rater  as  to  the  amount 
of  error  which  would  be  acceptable.  As  the  cutoff  is  low- 
ered to  permit  more  gifted  students  to  exceed  it,  increas- 
ing numbers  of  not-gifted  students  would  also  exceed  it, 
thus  increasing  the  possibility  of  false  negatives.  To 
assist  with  this  problem,  multiple  cutoff  scores  were  cal- 
culated so  that  the  most  desirable  cutoffs  could  be  deter- 
mined. Cutoff  scores  were  chosen  according  to  fractions 
of  standard  deviation  units  for  the  total  score  distribu- 
tions on  the  SIT,  the  new  test  created  by  selecting  items 
with  phi,  and  the  new  test  created  by  selecting  items  with 
D.  The  mean  SIT  IQ  for  the  Phase  I  sample  was  137.30  and 
the  standard  deviation  was  14.12.  This  mean  IQ  was  ap- 
proximately 37  points  higher  than  that  of  the  general  pop- 
ulation. It  reflects  the  above  average  general  intelli- 
gence of  the  research  sample.  In  Phase  II,  these  standard 
deviation  cutoff  scores  calculated  as  standard  deviations 
from  the  Phase  I  sample,  were  applied  to  the  cross  valida- 
tion sample  to  determine  the  relative  accuracy  of  gifted 
classifications  using  the  NP-phi,  NP-D,  and  SIT. 
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Phase  II--Cross  Validation 

Application  of  Cutoff  Scores 

The  first  step  in  testing  the  NP  items  obtained  in 
Phase  I  was  to  calculate  the  numbers  of  NP  items  scored 
correctly  for. the  Phase  II  sample  and  then  to  apply  appro- 
priate cutoff  points  to  those  scores.  The  Phase  II  sub- 
sample  consisted  of  61  students.  Proportions  of  correct 
classifications  among  the  NP-phi,  NP-D  and  SIT  items  were 
then  analyzed  to  determine  the  test  and  cutoff  that  most 
accurately  differentiated  the  gifted  and  not-gifted  stu- 
dents . 

Tables  4-3,  4-4,  and  4-5  show  the  numbers  of  examin- 
ees classified  into  prediction  categories  at  various  cut- 
off points  where  the  examinee's  true  status  was  defined  as 
classification  of  gifted  or  not-gifted  according  to  the 
WISC-R.  For  all  three  classification  procedures  (NP-phi, 
NP-D,  and  SIT)  little  variability  existed  between  predic- 
tion rates  at  some  of  the  cutoffs.  For  example,  classif- 
ications at  the  +1.0  and  -1.0  differed  only  slightly  from 
those  at  the  +.66  and  -.66  cutoffs  respectively.  A  sim- 
ilar result  occurred  for  the  +.50  cutoff  compared  to  the 
+.30  cutoff  and  for  the  -.50  cutoff  compared  to  the  -.30 
cutoff  . 
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Table  4-3.   Phase  II:   Number  of  Examinees  in  Each 
Prediction  Category  by  Cutoff/Phi 


True 
Positive 

True 
Negative 

True 
Total 

False 
Negative 

False 
Positive 

False 
Total 

Cutoff 

• 

-1.00 

SD 

30 

5 

35 

2 

24 

26 

-  .66 

SD 

30 

5 

35 

2 

24 

26 

-  .50 

SD 

30 

5 

35 

2 

24 

26 

-  .33 

SD 

30 

8 

^8 

2 

21 

23 

0.00 

SD 

29 

12 

41 

3 

17 

20 

+  .33 

SD 

23 

17 

4  0 

9 

12 

21 

+  .50 

SD 

18 

19 

37 

il 

10 

24 

+  .66 

SD 

8 

24 

32 

2  4 

5 

29 

+  1.00 

SD 

3 

29 

32 

2  9 

0 

2  9 

Total  gifted  =  32 
Total  not-gifted  =  29 
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Table  4-4.  Phase  II:  Number  of  Examinees  in  Each 
Prediction  Category  by  Cutoff/Index  of 
Discrimination 


True 
Positive 

True 
Negative 

True 
Total 

False 

Negative 

False 
Positive 

False 
Total 

Cutoff 

-1.00 

SD 

31 

5 

36 

1 

24 

25 

-  .66 

SD 

31 

9 

4  0 

1 

20 

21 

-  .50 

SD 

31 

9 

4  0 

1 

2  0 

21 

-  .33 

SD 

31 

10 

41 

1 

19 

20 

0.00 

SD 

26 

15 

41 

6 

14 

20 

+  .33 

SD 

26 

15 

41 

6 

14 

20 

+  .50 

SD 

22 

19 

41 

10 

10 

20 

+  .66 

SD 

22 

19 

41 

10 

10 

20 

+  1.00 

SD 

15 

24 

39 

17 

5 

22 
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Table  4-5.   Phase  II:   Number  of  Examinees  in  Each 
Prediction  Category  Cutoff/SIT 


Cut- 
off 

True 
Positive 

True 
Negative 

True 
Total 

False 
Negative 

False 
Positive 

False 
Total 

-1.0 

SD 

32 

6 

38 

1 

22 

23 

-  .66 

SD 

32 

7 

3  9 

1 

21 

22 

-  .50 

SD 

31 

7 

38 

2 

21 

23 

-  .33 

SD 

28 

9 

37 

5 

19 

24 

.00 

SD 

22 

18 

40 

11 

10 

21 

+  .33 

SD 

14 

22 

36 

19 

6 

25 

+  .50 

SD 

12 

24 

36 

2  0 

5 

25 

+  .66 

SD 

9 

26 

35 

23 

2 

25 

+  1.0 

SD 

5 

29 

34 

27 

0 

27 
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Table  4-6  was  constructed  to  consolidate  and  clarify 
the  data  on  predictive  accuracy,  in  that  table,  propor- 
tions of  correct  and  incorrect  predictions  are  again 
expressed  as  true  positives  (TP),  true  negatives  (TN), 
false  positives  (FP),  and  false  negatives  (FN).  Deleted 
were  the  cutoffs  -1.0  SD,  -.50  SD,  +.50  SD,  and  -1.0  SD. 
Inspection  of  Table  4.6  reveals  that  all  three  methods  of 
screening  gifted  students  were  approximately  equivalent. 
That  is,  all  three  found  the  greatest  proportions  of  TP 
predictions  in  .492  through  .525  of  the  sample.  Lowest  TN 
predictions  occurred  in  .082  through  .148  of  the  sample. 

After  the  prediction  rates  were  established  for  the 
three  tests  and  the  prediction  values  were  expressed  as 
proportions  of  the  sample,  analyses  were  conducted  to  de- 
termine the  proportions,  if  any,  that  were  most  accurate. 
Coefficient  Kappa  analyses  were  calculated  to  assess  the 
relative  gifted  prediction  accuracy  of  the  NP-phi,  NP-D, 
and  SIT. 

Kappa  Comparisons 

In  results  shown  in  Table  4-7,  P  values  correspond  to 
percentages  of  correct  classifications  (both  true  pos- 
itives and  true  negatives).  However,  two  equal  classifi- 
cation percentages  (e.g.,  as  observed  for  phi  and  D  at  0.0 
cutoff)  may  not  have  equal  corresponding  K-values  because 
the  Kappa  equation  adjusts  for  chance  correct  predictions. 
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Table  4-6.   Phase  II:   Proportions  of  Correct  and 
Incorrect  Predictions  for  NP  (phi), 
NP  (D),  and  SIT  at  Two  Score  Cutoffs 


z-      True      True    True     False    False    False 
Cutoff   Positive   Negative   Total   Negative   Positive   Total 


Phi 


- 

.66  SD 

.492 

.082 

.574 

.033 

.393 

.426 

- 

.33  SD 

.492 

.131 

.623 

.033 

.344 

.377 

.00  SD 

.475 

.197 

.672 

.049 

.229 

.328 

+ 

.33  SD 

.377 

.279 

.656 

.164 

.180 

.344 

+ 

.66  SD 

.131 

.393 

.525 

.393 

.082 

.475 

D 


-  .66  SD 

.508 

.148 

.656 

.016 

.328 

.344 

-  .33  SD 

.508 

.164 

.672 

.016 

.311 

.328 

.00  SD 

.426 

.246 

.672 

.098 

.230 

.328 

+  .33  SD 

.426 

.246 

.672 

.098 

.230 

.328 

+  .66  SD 

.361 

.311 

.672 

.164 

.164  " 

.328 

SIT 


-  .66  SD 

.525 

.115 

.639 

.016 

.344 

.361 

-  .33  SD 

.459 

.148 

.607 

.082 

.311 

.393 

.00  SD 

.361 

.295 

.656 

.180 

.164 

.344 

+  .33  SD 

.270 

.360 

.590 

.311 

.082 

.410 

+  .66  SD 

.164 

.426 

.590 

.361 

.049 

.410 

Note:   All  values  rounded  to  nearest  .10  percentile 
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Table  4-7.   Percentage  of  Correct  Classifications 
and  Kappa  Values  for  NP  (phi  and  D) 
and  SIT  at  Various  Cutoff  Scores 


NP 

SIT-P* 

NP 

Cutoff 

Phi-f 

* 

D-P* 

Phi-k 

1   D-k1 

SIT-k1 

-  .66  SD 

57.4 

65.6 

63.9 

.114 

.289 

.234 

-  .33  SD 

62.3 

67.2 

60.7 

.220 

.324 

.178 

.00  SD 

67.2 

67.2 

65.6 

.328 

.334 

.309 

+  .33  SD 

65.6 

67.2 

59.0 

.306 

.320 

.258 

+  .66  SD 

52.5 

67.2 

59.0 

.076 

.342 

.204 

*P  =  percentage 
k  =  Kappa  value 
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An  example  of  the  Kappa  calculation  of  percentages  of 
correct  classifications  for  NP-phi  at  the  -.66  cutoff  is 
illustrated  using  four-fold  tables  (Figure  4-1).  Values 
in  the  equation,  along  with  percentages  of  examinee  clas- 
sifications at  other  cutoffs,  are  located  in  Table  4-6. 

1  "  Pc 

P  =  .492  +  .082  =  .574 
Pc  =  (  .885)  (.525)  +  (  .475)  (  .115) 
=  .4646  +  .0546 
=  .5192 
Results  of  the  Kappa  analyses  revealed  a  consistent 
pattern  of  greater  values  for  the  NP-D  than  for  the  NP-phi 
or  the  SIT  at  all  cutoffs,  suggesting  the  NP-D  may  be  the 
more  accurate  predictor  of  correct  classification.  The 
greatest  K  value  for  NP-D  occurs  at  the  +.66  where  the  K 
value  of  .342  indicates  that  there  is  a  34%  improvement  in 
prediction  accuracy  over  that  expected  by  chance.   The 
greatest  discrepancy  between  K  values  for  NP-D  and  the  SIT 
appears  at  the  -.33  cutoff  where  K  =    .324  for  NP-D  but 
only  .178  for  the  SIT. 
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.475 
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Figure  4-1 


Computation  Example  of  Kappa 
analysis  for  True  Phi  Classi- 
fications at  -.66  SD  Cutoff 
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Summary  of  Results 

In  Phase  I  of  this  study  56  items  from  the  CTBS/TCS 
ere  identified  using  phi  coefficients  that  significantly 
differentiated  qiftod  from  not-qirtod  students.  Tho  i  ndox 
of  discrimination  analyses  yielded  24  items  that  success- 
fully discriminated  gifted  from  not-gifted.  A  large  pro- 
portion of  significant  D  items  were  common  to  the  phi 
analysis  results. 

In  Phase  II,  total  scores  for  each  subset  of  items 
were  computed  using  a  cross  validation  sample  of  exam- 
inees. The  accuracy  of  classifications  resulting  from 
each  of  the  two  new  tests  and  the  SIT  were  contrasted  at 
five  different  cutoff  scores.  Kappa  analyses  revealed 
higher  correct  prediction  values  for  the  NP-D  than  for 
either  the  NP-phi  and  the  SIT  at  all  cutoff  levels. 


CHAPTER  V 
DISCUSSION 


Research  Questions 


In  order  to  answer  the  first  research  question,  this 
discussion  focuses  on  issues  related  to  NP  item  validity 
from  Phase  I.  Phase  II  NP  items  were  selected  from  the 
CTBS  and  TCS  using  correlational  analyses.  To  answer  the 
remaining  two  research  questions,  the  predictive  accura- 
cies of  the  NP-phi,  NP-D,  and  SIT  are  discussed  in  terms 
oT  cutoff  scores. 

The  following  research  questions  were  addressed  in 
this  study. 

1.  Can  an  accurate  predictor  of  gifted  IQ  classifi- 
cation on  the  WISC-R/S-B  be  derived  from  an  in- 
strument composed  of  items  on  the  CTBS  and  TCS 
in  a  situation  in  which  giftedness  is  viewed  as 
a  dichotomous  trait  variable? 

2.  Is  the  NP  more  accurate  than  the  SIT  in  classi- 
fying gifted  and  not-gifted  seventh  graders? 

3.  At  what  cutoff  point(s)  is  the  NP  more  accurate 
than  the  SIT  in  classifying  gifted  and  not- 
gifted  seventh  graders? 
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Phase  I 

In  Phase  I  of  the  study  it  was  suggested  that  some 
items  derived  from  the  CTBS/TCS  were  valid  for  distin- 
guishing between  gifted  and  not-gifted  seventh  graders. 
Evidence  for  this  assertion  included  (a)  the  relatively 
large  number. of  NP  items  that  are  common  to  both  the  phi 
and  D  analyses,  (b)  the  large  number  of  items  found  to 
discriminate  between  the  two  groups,  and  (c)  the  agreement 
between  CTBS  and  IRT  location  parameter  estimates  and  NP 
item  statistics. 
Common  Items 

The  index  of  discrimination  (D)  analyses  yielded 
fewer  discriminating  items  than  did  the  phi  analyses. 
However,  the  validity  of  the  D  analyses  is  supported  by 
the  overlap  of  items  between  the  phi  and  D  analyses.  Of 
the  14  items  retained  for  the  NP-D,  11  items  were  also  in- 
cluded in  the  phi  NP  scale.  Overlap  between  phi  and  D 
item  discrimination  for  middle  difficulty  items  was  re- 
ported by  Engelhart  (1965). 

Support  for  the  validity  of  the  NP  also  was  suggested 
by  the  relatively  large  number  of  significant  items,  par- 
ticularly on  the  phi  analyses.  In  addition  to  the  56  sig- 
nificant phi  items  initially  computed  at  ^.05,  another  29 
were  significant  at  the  .10  level.  Clearly  an  even  larger 
number  of  significant  items  would  have  resulted  had  the 
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sample  size  of  117  remained  constant.   However,  the  sample 
size  diminished  below  75  on  99  items.   Of  these  items,  20 
had  phi  values  that  would  have  been  significant  at  <.05 
had  the  N  been  as  high  as  113. 
IRT  Parameters 

A  strong  case  for  the  NP  item  validity  is  made  upon 
investigation  of  item  response  theory  item  location  param- 
eters for  the  CTBS  items.  An  investigation  of  the 
response  patterns  of  subjects  on  both  item  sets,  in  com- 
parison with  the  response  pattern  on  these  items  by  the 
original  CTBS  sample,  suggest  commonality  in  responding. 

til^^tem_l.ocation_£arameter.  Item  response  theory 
was  utilized  by  the  CTBS/TCS  constructors  for  item  selec- 
tion. Item  characteristic  curves  were  plotted  using  a 
three  parameter  logistic  model  involving  (a)  item  dis- 
crimination, (b)  item  location  (or  difficulty),  and  (c)  a 
"guessing"  factor.  A  location  parameter  describes  an 
item's  difficulty  in  terms  of  the  student's  ability  level 
(or  latent  trait).  An  item  discriminates  best  for  a  stu- 
dent whose  ability  level  is  near  the  item's  location 
parameter  (or  difficulty  level).  An  item  with  a  high 
location  parameter  serves  its  function  only  for  students 
of  high  ability  since  low  ability  students  would  not  be 
expected  to  answer  these  items  correctly. 

Comparing  response  patterns.   An  assumption  of  item 
response  theory  is  the  invariance  of  item  parameters 


(Anastasi,  1982;  Baker  1984).  That  is,  given  certain  con- 
ditions, item  parameters  should  be  uniform  among  different 
populations  because  individual  items  are  assumed  to  meas- 
ure the  same  trait  in  different  populations.  In  this 
case,  the  validity  of  the  NP  items  would  be  supported  if 
NP  items  were  common  to  CTBS  items  with  high  location 
parameters.  In  fact,  41  of  the  original  65  NP  items  had 
CTBS  location  parameters  above  the  mean  lor  the  norming 
sample  (CTBS  Technical  Report,  1983).  The  proportion  of 
NP-D  items  to  high  location  parameter  items  was  even 
greater   (16  of   24,    67%). 

In  summary,  the  data  support  an  affirmative  answer  to 
the  first  research  question.  There  exists  a  subset  of 
items  that  discriminated  gifted  and  not-gifted  seventh 
graders  who  were  all  academic  high  achievers.  In  Phase 
II,  cross  validation  with  a  smaller  but  otherwise  equiv- 
alent   sample   was   conducted    to   support   these    findings. 

Phase    II 

Cross  validation  of  Phase  I  findings  was  conducted  in 
Phase  II  by  assessing  the  accuracy  of  NP  items  as  compared 
to  the  SIT  in  classifying  the  gifted  and  not-gifted  stu- 
dents and  the  cutoff  score  at  which  each  procedure  showed 
greatest  accuracy.  These  goals  were  accomplished  by  an- 
alyzing proportions  of  correct  and  incorrect  classifica- 
tions using  coefficient  Kappa  analyses. 
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An  examination  of  total  true  classifications  (i.e., 
correct  classifications  of  gifted  and  not-gifted  students) 
revealed  greatest  prediction  accuracy  at  the  0.0  SD  cut- 
off. At  this  cutoff,  NP-D  and  NP-phi  values  were  equal 
(67.2)  and  slightly  superior  to  the  SIT  (65.6).  When  per- 
centage values  were  corrected  for  chance  occurrences  by 
the  Kappa  analyses,  NP-D  was  slightly  more  accurate  at  the 
+.66  SD  than  at  0.0  SD,  and  NP-D  items  were  superior  to 
NP-phi  and  the  SIT.  Furthermore,  at  the  +.66  cutoff,  the 
Kappa  value  for  the  NP-D  is  .342  contrasted  to  much  lower 
values  for  NP-phi  (.076)  and  SIT  (.204). 

In  answering  the  second  research  question,  data  con- 
sistently supported  the  NP  as  more  accurate  than  the  SIT 
in  classifying  gifted  and  not-gifted  students.  Among  the 
two  NP  tests,  NP-D  was  generally  superior  to  the  NP-phi. 
The  NP-D  was  clearly  the  better  measure  of  true  positive 
and  negative  classifications  based  on  Kappa  analyses. 

There  is  some  ambiguity  regarding  the  overall  most 
accurate  cutoff  for  classifying  students,  rendering  the 
last  research  question  more  difficult  to  answer.  There  is 
empirical  support  for  the  NP-D  at  both  the  -.33  and  +.66 
cutoffs  as  most  accurate  for  classifying  gifted  and  not- 
gifted  seventh  graders  in  the  sample.  Kappa  analyses  sup- 
port that  finding  in  that  the  discrepancy  between  total 
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true  classifications  for  the  NP-D  and  the  SIT  was  greatest 
at  the  -.33  cutoff.  Data  supporting  an  NP-D  cutoff  at 
+.66  may  be  less  clear.  Kappa  analyses  indicate  that  the 
number  of  total  true  classifications  at  +.66  is  superior 
to  the  number  at  any  other  cutoff.  The  relative  strength 
of  the  NP-D  in  total  true  classifications  was  due  to  its 
much  greater  accuracy  in  predicting  true  positives.  The 
SIT  was  more  accurate  in  classifying  true  negatives.  In 
choosing  a  cutoff  score  for  gifted  IQ  screening  purposes, 
specific  classification  priorities  and  goals  must  be  taken 
into  account.  For  example,  if  the  primary  goal  is  to  max- 
imize total  true  classifications,  the  +.66  cutoff  may  be 
desirable. 

Conclusions,  Implications,  and  Limitations 

This  study  yields  conclusions,  implications,  and  lim- 
itations germane  to  both  pragmatic  and  theoretical  issues. 
In  the  following  sections,  the  issues  of  sampling,  gen- 
era 1  izabi  1  ity,  item  validity,  and  cutoff  scores  are 
discussed  individually. 

Sampling 

The  success  of  the  NP  in  classifying  two  groups  of 
students,  who  in  some  respects  appeared  indistinguishable, 
has  some  inherent  implications.  The  gifted  and  not-gifted 
children   were   similar    in   terms   of    achievement,     IQ    (in   many 


8  6 


cases),  and  teacher  perceptions  of  them  as  gifted.  These 
similarities  suggest  that  the  analysis  results  are  robust 
because  the  two  groups  were  accurately  differentiated  in 
spite  of  their  likenesses. 

The  specific  trait  of  the  research  sample  as  being 
high  academic  achievers  distinguishes  this  study  from  many 
others  and  this  distinction  is  essential  to  the  utility  of 
the  NP.  In  most  of  the  related  literature  reviewed  in 
Chapter  II,  research  samples  were  not  as  restricted  in 
range  of  IQ,  rendering  correct  classification  of  gifted 
and  not-gifted  more  likely  in  those.  In  other  studies, 
many  samples  included  students  randomly  selected  from 
general  populations.  Those  students,  excluded  in  this 
study  by  preselection,  were  readily  screened  as  not-gifted 
in  other  studies,  allowing  for  artificially  inflated  accu- 
racy predictions. 

A  pertinent  limitation  to  this  study  discussed  in 
Chapter  III  was  the  sampling  procedure.  Specifically, 
some  students  were  deleted  from  formal  IQ  testing,  and 
therefore,  excluded  from  the  sample  because  they  scored 
below  the  SIT  screening  cutoff  of  132.  However,  one-sixth 
(i.e.,  11  of  61)  Phase  II  students  had  SIT  scores  below 
132,  with  some  scores  falling  in  the  average  and  below 
average  range.  This  occurrence  raises  a  question  about 
the  criterion  used  for  disqualifying  students  from  IQ 


87 

testing  based  on  their  SIT  scores.  It  is  unclear  how  stu- 
dents with  SIT  scores  in  the  80s  were  not  disqualified 
while  others  were.  It  seems  that  screening  procedures 
were  not  followed  consistently  at  the  school  level.  For- 
tunately, even  though  there  were  relatively  few  students 
with  SIT  IQs  below  132,  there  were  enough  for  data  analy- 
ses, and  the  range  of  scores  was  wide.  In  this  case, 
sampling  error  probably  was  not  a  confounding  influence  on 
results  because  disqualification  of  the  aforementioned 
students  seems  to  have  occurred  in  a  random,  unsystematic 
manner . 

General izability 

In  large  part,  outcomes  regarding  screening  accuracy, 
item  validity,  and  cutoff  scores  may  not  be  generalizable 
to  other  populations  at  this  time.  However,  the  success 
of  the  research  procedure  (NP)  is  considered  meaningful  in 
as  much  as  the  accuracy  of  the  NP  will  be  tested  contin- 
ually in  its  practical  application  on  potentially  gifted 
students.  in  this  way,  the  generalizabil ity  of  screening 
accuracy,  item  validity,  and  cutoff  scores  will  be  val- 
idated on  other  populations. 

In  Chapter  I  the  definition  of  intellectual  gifted- 
ness  for  this  study  was  briefly  discussed.  This  pragmatic 
definition  focuses  only  on  IQ  test  scores  that  fall  above 
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a  particular  cutoff  score.  Other,  more  theoretical  def- 
initions, may  involve  other  qualitative  differences. 
Results  of  this  study  may  not  be  general izable  to  students 
who  are  designated  gifted  using  criteria  other  than  their 
IQs  falling  above  the  96th  percentile. 

Screening  Accuracy 

Generalizability  of  research  results  would  be  en- 
hanced by  analyzing  group  test  data  for  other  restricted 
populations  to  determine  if  subgroups  may  be  successfully 
discriminator]  using  test  items.  Two  preliminary  steps  in 
such  a  process  would  be  to  (a)  analyze  test  results  of 
other  populations  of  potentially  gifted  students,  such  as 
elementary  or  high  school,  on  the  CTBS/TCS,  and  (b)  deter- 
mine if  other  group  achievement/cognitive  ability  tests 
possess  items  that  accurately  classify  potentially  gifted 
students.  Further,  researchers  may  seek  to  generalize 
results  on  more  divergent  restricted  populations.  For 
example,  it  may  be  useful  to  screen  for  mild  mental  retar- 
dation among  remedial  students  or  for  learning  disabil- 
ities among  children  with  discrepant  report  card  grades. 
Item  Validity 

A  somewhat  unexpected  result  of  this  research  was 
that  the  CTBS  items  (which  supposedly  measure  school 
achievement)  contributed  much  more  to  the  NP  than  did  the 
TCS  items  (which  are  purported  to  measure  intelligence). 
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Indeed,  none  of  the  NP-D  items  were  obtained  from  the  TCS. 
The  NP-phi  test,  which  initially  included  eight  TCS  items, 
was  found  to  be  generally  less  accurate  in  classifying 
students  than  the  NP-D.  Essentially,  the  achievement 
items  had  greater  criterion  related  validity  than  did  the 
cognitive  ability  items  for  Full  Scale  IQ. 

Ostensibly,  the  superior  criterion  validity  of  the 
CTBS  items  is  contrary  to  expectation,  however  a  closer 
examination  suggests  that  this  pattern  may  support  rather 
than  refute  the  general izabi 1 ity  of  research  findings. 
Examination  of  CTBS,  TCS,  and  WISC-R  subtests  reveals  that 
the  CTBS  probably  has  more  in  common  with  the  WISC-R  than 
does  the  TCS.  For  example,  CTBS  subtests  such  as  Mathe- 
matics, Science,  Social  Studies,  and  Vocabulary  have  What 
appear  to  be  direct  correlates  on  the  WISC-R.  The  rela- 
tionship between  TCS  and  WISC-R  subtests  appears  more  ob- 
scure. This  assertion  is  supported  by  Wurster  (1985)  who 
compared  overlap  between  the  TCS  and  WISC-R  with  the  SIT 
and  the  WISC-R.  She  found  that  87.8%  of  the  SIT  items 
measured  the  same  skills  as  the  WISC-R  Verbal  Scale,  how- 
ever, "no  items  from  the  TCS  appeared  to  measure  any  of 
the  skills  that  are  measured  by  the  11  WISC-R  subtests" 
(p.  24). 

The  relationship  of  commonality  between  IQ  test  and 
achievement  test  performance  for  this  sample  seems  related 
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to  two  factors.  First,  it  is  likely  that  for  high  achiev- 
ing students,  "superior"  intelligence  is  heavily  loaded  in 
the  verbal  reasoning  domains  that  are  tapped  to  a  greater 
extent  by  the  CTBS  than  by  the  TCS.  The  TCS,  to  a  greater 
degree  than  the  CTBS,  measures  non-verbal  skills  or  abil- 
ities that  may  be  less  represented  by  high  achieving  stu- 
dents' strengths'  on  the  WISC-R  or  Stanford  -Binet.  Future 
researchers  might  investigate  this  hypothesis  by  examining 
the  relationship  between  achievement  and  verbal  vs.  non- 
verbal intelligence  (as  represented  by  these  or  similar 
tests)  for  high  achieving  or  gifted  students. 

A  second  explanation  for  the  disproportionate  repre- 
sentation of  achievement  test  items  on  the  NP  concerns  the 
theoretical  rationale  for  this  research.  That  is,  the 
content  of  achievement  tests  (e.g.,  CTBS)  and  IQ  tests 
(e.g.,  WISC-R)  are  often  operationally  indistinguishable. 
This  premise  has  been  supported  empirically  by  Anastasi 
(1982),  among  many  others  (see  Chapter  II). 
Cutoffs 

Conclusions  regarding  the  most  desirable  cutoff  for 
the  NP  (NP-D  at  -.33  or  +.66)  were  discussed  earlier  in 
the  chapter.  There  are  some  other  conclusions  that  may  be 
drawn  regarding  cutoff  scores  for  the  SIT  with  this  popu- 
lation. The  mean  SIT  score  for  the  Phase  II  sample  was 
137.  This  score  is  somewhat  above  what  would  be  expected 
yet  consistent  with  research  findings  by  Karnes  and  Brown 
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(1979)  and  Rust  and  Lose  (1980)  who  found  that  the  SIT 
tends  to  overestimate  high  IQs  on  full  length  tests.  In 
their  studies,  those  researchers  recommended  setting  SIT 
cutoff  scores  for  gifted  classification  higher  than  two 
standard  deviations  above  the  mean  to  offset  this  tendency 
to  overestimate  the  IQs  of  brighter  students.  The  current 
findings  support  those  recommendations  because  the  SIT  was 
most  accurate  in  classifying  true  positive  and  false  neg- 
ative findings  at  over  three  standard  deviations  above  the 
mean  (SIT  IQ  =  149). 

Summation 

In  his  later  years  Edwin  R.  Guthrie  (1959)  suggested 
that  research  has  no  inherent  value,  but  rather  that  its 
value  was  gained  from  its  practical  applications.  It  is 
in  the  spirit  of  that  philosophy  that  this  research  may  be 
fully  appreciated.  Relatively  accurate  in  its  predictive 
power  and  efficient  in  its  method,  the  NP  analysis  will  be 
easily  replicated  on  new  student  samples  and  test  formats. 
Thus,  the  NP  may  be  best  viewed  as  a  procedure  adaptable 
to  varying,  yet  specific,  needs. 
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